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ABSTRACT 


This  thesis  reviews  various  techniques  for  determining 
the  meaning  of  statements  written  in  a  programming  language 
using  the  syntax  specifications  of  the  language.  The  review 
covers  the  basic  theory  of  phrase  structure  grammars,  syntax 
specifications,  and  syntax  directed  analyzers.  Working  models 
of  the  three  main  syntax  directed  analyzers,  conventional, 
multiple  parse,  and  transition  diagrams  were  constructed  and 
tested  using  APL. 
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CHAPTER  I 


INTRODUCTION 

1 . 1  Communication  and  Translation 

Whenever  a  person  wishes  to  communicate  with  another 
it  is  necessary  for  him  to  send  a  message  which  the  receiver 
can  understand.  If  the  people  have  a  common  language  this 
process  is  simple.  If  their  languages  are  different  a  trans 
lation  must  be  made. 

The  translation  process  is  direct  if  there  is  a  one- 
to-one  correspondence  of  the  words  in  the  two  languages. 
However,  a  word  usually  has  a  number  of  meanings  in  one 
language.  These  may  be  represented  by  a  set  of  words  in 
the  other  language.  In  order  for  the  translator  to  decide 
which  meaning  the  sender  intended  it  is  necessary  to  conside 
the  word  in  relation  to  surrounding  words,  i.e.  determine 
the  meaning  of  the  word  by  information  supplied  by  its 
context . 

The  same  general  concept  of  communication  applies  to 
computers  and  computer  programs.  The  programmer  plays  the 
role  of  the  sender.  His  programming  language  is  a  combina¬ 
tion  of  natural  language  and  mathematical  notation.  The 
computer  is  the  receiver.  Its  language  is  a  numerical 
representation  of  the  instructions  it  can  perform.  If  a 
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programmer  has  a  problem  which  can  be  described  in  a  pro¬ 
gramming  language  it  can  only  be  run  on  a  computer  if  a 
translation  is  made. 

Early  computers  required  that  this  translation  be 
done  by  the  programmer.  Later,  with  the  development  of 
larger  and  faster  computers,  it  became  feasible  to  assign 
the  translation  process  to  the  computer  itself.  Iverson 
(1962)  places  such  translators  in  the  following  categories, 
compilers,  assemblers,  generators,  and  interpreters.  Com¬ 
pilers  accept  a  program  expressed  in  an  argument  or  source 
language  and  reproduces  the  program  in  a  function  language, 
usually  machine  code,  to  be  run  later.  Assemblers  are 
special  compilers  in  which  the  statements  of  the  program 
are  virtually  independent  of  each  other.  Thus,  the  state¬ 
ments  can  be  considered  one  at  a  time  and  are  simple  (not 
compound)  so  there  are  no  contextual  meanings.  Assemblers 
generally  are  used  to  translate  so-called  '’symbolic"  programs 
into  machine  code.  A  generator  produces  any  one  of  a  set  of 
function  programs  based  on  a  parameter  which  is  supplied  to 
it.  Generators  are  frequently  incorporated  in  compilers. 

An  interpreter  executes  the  segment  of  a  function  program 
corresponding  to  a  statement  of  the  argument  program 
immediately  after  it  is  produced.  The  statements  of  the 
argument  program  are  selected  in  a  sequence  determined  by 
the  function  program. 
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The  conventional  approach  to  translator  writing  requires 
a  separate  translator  for  each  computer-programming  language 
combination  desired.  This  method  of  producing  a  translator 
is  expensive  and  time  consuming  as  the  rules  of  the  language 
are  embedded  in  the  structure  of  the  translator  program  itself. 
The  proliferation  of  computers  and  languages  had  made  this 
procedure  uneconomical  and  various  alternatives  have  been 
suggested . 

1 . 2  Linguistics:  A  New  Approach  in  Translators 

In  I960  a  new  approach  to  compiler  writing  began  to 
evolve.  Linguistic  theory  had  developed  the  principle  of 
phrase  structure  grammars  as  a  tool  for  the  study  of  the 
structure  of  natural  languages =  Although  compiler  writers 
had  indirectly  used  the  structure  of  a  language  in  their 
translators,  linguistic  theory  now  provided  the  basis  for 
an  organized  use  of  structure  in  translators.  (Metcalfe 
(1964) ,  Davis  (1966)) 

The  production  of  a  programming  language  system  for  a 
computer  requires  a  union  of  the  definition  of  the  language, 
the  design  of  the  translator,  and  the  characteristics  of 
the  computer.  Prior  to  i960  these  factors  were  combined 
in  the  development  of  the  translator.  Syntax  oriented  trans¬ 
lators  use  aspects  of  linguistic  theory  to  separate  the 
definition  of  the  language  and  the  design  of  the  translator. 
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The  first  separate  definition  of  a  programming  language  took 
place  with  the  introduction  of  ALGOL.  In  1961  Irons  demon¬ 
strated  the  feasibility  of  syntax  oriented  translators  by  pro¬ 
ducing  a  compiler  which  worked  from  a  set  of  specifications 
for  a  language.  Since  then  various  means  of  achieving  this 
separation  have  been  developed.  (Irons  (1961),  Davis  (1966)) 

Although  syntax  oriented  techniques  are  developing 
rapidly  in  a  number  of  directions,  all  the  methods  stem  from 
phrase  structure  grammars  which  comprise  only  one  facet  of 
linguistics.  This  thesis  explains  the  important  aspects  of 
phrase  structure  grammars  and  considers  topics  related  to 
the  use  of  such  grammars  in  translation  systems.  In  particular, 
it  will  review  certain  syntax  oriented  techniques  for  recog¬ 
nizing  structures  of  a  program  based  on  specifications  of  the 
syntax  of  the  language  involved. 

The  study  of  recognition  algorithms  consists  mainly  of 
working  models  in  a  programming  language  called  APL.  This 
language  is  an  automated  version  of  a  notation  which  was 
first  described  in  Iverson’s  ’A  Programming  Language’  (1962). 

APL  was  chosen  because  its  concise  notation  enables 
one  to  describe  a  system  in  detail  and  at  the  same  time 
provides  an  operating  model  which  can  be  run  on  a  computer. 

In  addition,  APL  is  used  in  a  time-sharing  environment,  which 
permits  easy  modifications  to  models  and  an  immediate  deter¬ 
mination  of  the  effects. 


. 
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CHAPTER  II 


LANGUAGE  AND  MEANING 

2 . 1  Linguistic  Definitions  and  Programming;  Languages 

Syntax  analysis  of  programming  languages  is  a  result 
of  the  similarity  between  programming  languages  and  natural 
languages.  Because  programming  language  analysis  has 
borrowed  many  terms  from  natural  language  analysis,  it  is 
advantageous  to  consider  the  main  terms  that  have  developed 
in  linguistic  theory. 

A  written  language  conveys  meaning  by  means  of  objects 
or  marks  which  are  catenated  to  form  strings.  The  syntax 
of  a  language  refers  to  the  linear  arrangement  of  these 
objects.  A  rule  of  syntax  states  some  permissible  (or 
prohibited)  relation  between  objects.  The  grammar  of  the 
language  is  the  set  of  syntactic  rules.  Semantics  defines 
the  relationship  between  an  object  and  the  set  of  meanings 
attributed  to  the  object.  A  symbol  is  an  object  to  which 
at  least  one  meaning  has  been  attributed.  Pragmatics  defines 
the  relation  between  a  symbol  and  its  user.  An  object  must 
have  at  least  one  meaning  to  be  of  value  to  a  user.  A  rule 
of  pragmatics  is  applied  by  a  user  to  select  from  the  set  of 
meanings  attributed  to  a  symbol,  that  particular  meaning 
which  is  significant  to  a  particular  user  at  a  particular 
time.  A  sentence  is  the  smallest  unit  which  a  meaningful 
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string  can  form  in  natural  language.  (Ingerman  ( 1 9 6 6 ) ) 

The  parse  of  a  sentence  indicates  what  rules  of  the  grammar 
were  used  to  form  the  sentence. 

Natural  languages  are  more  or  less  capable  of  describ¬ 
ing  the  wide  range  of  topics  encountered  by  humans.  Further¬ 
more,  much  meaning  is  often  contained  in  context  within  a 
sentence.  Programming  languages  on  the  other  hand  need  only 
describe  the  limited  number  of  operations  which  can  be  performed 
by  a  computer  and  the  operands  which  are  used.  The  operands 
are  the  names  of  memory  locations,  registers,  or  external 
devices.  The  linguistic  terms  can  now  be  defined  more  formally 
and  simply. 

Language  now  becomes  a  method  of  describing  a  process 
through  the  use  of  symbols  which  represent  operations  and 
operands.  Syntax  is  concerned  with  the  arrangement  of  symbols, 
independent  of  their  meaning.  Semantics,  which  relates  symbols 
and  their  meanings,  is  restricted  in  that  each  symbol  has  only 
a  small  number  of  possible  well-defined  meanings.  Pragmatics 
is  concerned  with  how  the  translator  will  select  the  meaning 
of  a  symbol  in  the  source  language.  The  smallest  unit  for  a 
meaningful  string  in  a  programming  language  is  called  a 
statement  rather  than  a  sentence.  (Gorn  ( 1 9 6 1 ) ) 
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2 . 2  Structure  and  Meaning 

The  translation  of  a  string  of  symbols  from  one 
language  to  another  consists  of  preparing  a  string  of  symbols 
in  the  second  language  which  has  the  same  meaning  as  the 
original.  Attempts  at  having  computers  translate  natural 
languages  have  not  been  satisfactory  because  of  the  wide 
variation  in  permitted  structures  and  the  extensive  use  of 
contextual  meaning.  In  order  to  assign  the  translation  of 
programming  languages  to  computers  it  was  necessary  to  define 
the  structure  of  statements  which  could  be  used  to  convey  a 
given  meaning.  Thus,  determining  the  structure  of  a  state¬ 
ment  is  equivalent  to  determining  the  meaning  of  the  statement. 

The  meaning  of  a  statement  in  a  programming  language 
is  absolute  or  deterministic  in  that  it  can  be  uniquely 
explained  in  terms  of  changes  which  are  effected  on  a  certain 
set  of  variables  by  obeying  the  statement.  For  example, 
execution  of  the  statement  A^-B+C  will  result  in  the  current 
value  of  the  variable  A  being  replaced  by  the  sum  of  the 
values  of  variables  B  and  C.  Since  machine  language  can 
describe  all  basic  operations  on  variables,  the  translation 
process  can  be  well  defined.  (Wirth  and  Weber  (1966)) 

In  order  to  determine  the  meaning  of  statements  written 
in  a  programming  language,  three  topics  must  be  considered. 
First  the  restrictions  and  rules  which  define  the  structures 
in  the  language  must  be  presented.  Second  these  rules  must 
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organized  so  they  can  be  used  by  a  translator.  Finally, 
algorithms  must  be  designed  and  developed  for  the  parsing  of 
statements  so  as  to  determine  their  structure  in  relation  to 
the  rules. 


CHAPTER  III 


PROGRAMMING  LANGUAGE  GRAMMARS 

3 . 1  Phrase  Structure  Grammars 

Formal  grammars,  which  define  languages  suitable  for 
automatic  translation,  can  be  classified  by  the  restrictions 
placed  on  the  syntactic  structures  in  the  language.  One 
classification  is:  phrase  structure  grammars,  standard 
form  grammars,  bounded  context  grammars,  operator  grammars 
and  precedence  grammars.  Of  these,  phrase  structure  grammars 
are  the  most  general  and  will  be  defined  and  described  in 
detail.  The  remaining  grammars  will  then  be  discussed. 

The  following  account  of  phrase  structure  programming 
languages  is  based  on  the  definitions  given  by  Wirth  and 
Weber  ( 1966 ) . 

A  vocabulary  V  is  a  set  of  symbols  denoted  by  capital 
Latin  letters  S,  T,  U,  etc.  Finite  sequences  of  symbols  - 
including  the  empty  sequence  N  -  are  called  strings  and 
are  denoted  by  small  Latin  letters  x,  y,  z,  etc.  The  set 
of  all  strings  over  V  is  denoted  by  V*  and  V  £  V# . 

A  simple  phrase  structure  system  is  an  ordered  pair 
(V,  R)  where  V  is  a  vocabulary  and  R  is  a  finite  set  of 
syntactic  rules  or  productions  r  of  the  form  U  -►  x  where 
x  j-  U,  U  e  V,  and  x  e  V* .  For  r  =  U  x, 


U  is  called  the 


10 


left  part  and  x  is  the  right  part  of  r.  The  component  U 
is  called  the  metaresult  and  the  components  of  x  are  called 
the  metacomponents. 

The  string  y  directly  produces  z  (y  z)  and  con¬ 
versely  z  directly  reduces  into  y,  if  and  only  if  there 
exist  strings  u,  v  such  that  y  =  uUv  and  z  =  uxv  and 
the  rule  U  -*•  x  is  an  element  of  R.  y  produces  z  (y  -*  z) 
and  conversely  z  reduces  into  y  if  and  only  if  there  exists 
a  sequence  of  strings  xo,...3xn  such  that  y  =  xq,  x^  =  z 
and 

x.  _  v  x.  (1=1.. ...n:  n>l). 

1—1  1  '333 

In  this  case  z  is  a  derivation  of  y. 

A  simple  phrase  structure  grammar  is  an  ordered 
quadruple  G  =  (V,  R,  B,  A) .  V  and  R  form  a  phrase 
structure  system  and  B  is  a  subset  of  V  such  that  none 
of  the  elements  of  B  (called  basic  or  terminal  symbols) 
occurs  as  the  left  part  of  any  rule  of  R.  All  elements  of 
V  -  B  (called  non-terminal  symbols)  occur  as  the  left  part 
of  at  least  one  rule.  A  is  the  symbol  which  occurs  in  no 
right  part  of  any  rule  of  R  and  is  referred  to  as  the  head 
of  the  language. 

The  letter  U  (or  U  )  will  denote  a  non-terminal 
symbol,  i.e.  e  V  -  B. 


eJa  1X3  e'isrfi  tt  v3Ino  brt£  *U  y  orfni  eeoub9«i  s  ^Xsa^evnoo 
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x  is  a  sentence  of  G  if  x  e  B#  ,  i.e.  x  is  a 
string  of  basic  symbols,  and  A  *  x. 

A  simple  phrase  structure  language  L  is  the  set  of 
all  strings  x  which  can  be  produced  by  (V,  R)  from  A: 

L(G)  =  { x | A  *  x  a  x  e  B  * }  . 

Let  U  *  z .  A  parse  of  the  string  z  into  the  symbol 
U  is  a  sequence  of  syntactic  rules  2*2  *  •  '  '  *—  n  su°h 

that  r\j  directly  reduces  Zj  1  into  (j=l,...,n)  and 

z  =  z  ,  z^  =  U.  A  canonical  parse  is  a  parse  which  proceeds 
strictly  from  left  to  right  in  a  sentence  and  reduces  a  left¬ 
most  part  of  a  sentence  as  far  as  possible  before  proceeding 
further  to  the  right. 

If  z,  =  U,  U0  .  .  .  U  (for  some  1  <  k  <  n),  then 
k  1  2  m 

z.  (i  <  k)  must  be  of  the  form  z.  =  u  u  ...  u  where  for 
i  i  1  2  m* 

each  s  =  l,...,m  either  U  *  u  or  U  =  u  .  The  canonical 
33  s  s  s  s 

form  of  the  section  of  the  parse  reducing  z^  into  z^  shall 

be  r.,  ,  r«,....r  where  the  sequence  {r  }  is  the  canonical 

— 1 3  —2  —  m  — s 

form  of  the  section  of  the  parse  reducing  U.  into  U  .  Clearly 

J.  s 

{r  }  is  empty  if  U  =  u  ,  and  is  canonical  if  it  consists  of 
— s  s  s 

one  element  only. 

An  unambiguous  syntax  is  a  phrase  structure  syntax  with 
the  property  that  for  every  string  x  e  L(G) ,  there  exists 
exactly  one  canonical  parse. 
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An  environment  E  is  a  set  of  variables  whose  values 
define  the  meaning  of  a  sentence.  An  interpretation  rule 
defines  an  action  (or  a  sequence  of  actions)  involving  a 
subset  of  the  environment. 

A  phrase  structure  programming  language  L  (G,  I,  E) 

P 

is  a  phrase  structure  language  L(G)  where  G  (V,  R,  B,  A) 
is  a  phrase  structure  syntax,  I  is  a  set  of  (possibly 
empty)  interpretation  rules  such  that  a  unique  one-to-one 
mapping  exists  between  elements  of  I  and  R,  and  E  is 
an  environment  used  by  the  elements  of  I_. 

The  meaning  M  of  a  statement  x  e  L  is  the  effect 

-  “P 

of  the  execution  of  the  sequence  of  interpretation  rules 

I_1 ,  1.2  *  *  *  *  »— n  on  env:5-ronment  E,  where  r  ^ ,  r^  ...  rn 

is  a  canonical  parse  of  the  sentence  x  into  the  symbol  A 
and  I,  corresponds  to  for  all  k.  The  meaning  may 

have  the  effect  of  changing  values  of  variables  or  of  changing 
the  environment  by  introducing  or  removing  variables. 

Phrase  structure  grammars  were  first  introduced  and 
studied  by  Chomsky  as  devices  for  generating  sentences  in 
natural  languages.  By  imposing  more  and  more  severe  restric¬ 
tions  on  the  productions,  four  types  of  grammars  were  defined. 

Type  0  grammars  place  no  restrictions  on  the  forms  of 
the  productions. 

Type  1  grammars  (context  dependent)  have  productions  of 


the  form  x  +  y  where 
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x  =  e  U  f,  y  =  e  w  f,  and  w  ^  N. 

Type  2  grammars  (context  free)  have  productions  of  the 
form  x  -►  y  where 

x  =  e  U  f,  y  e  e  w  f,  w  /  N,  but  U  ->  w. 

Type  3  grammars  (finite  state)  are  of  the  form  x  -+  y 

where 


x  E  e  U  f,  Y  E  e  w  f,  w^N,  U  w  but 

all  productions  are  of  the  form 

w  =  t  or  w  =  t j  is  a  non-terminal 

and  t  is  a  terminal.  (Chomsky  (1959),  Landweber  (1964)) 
Subsequent  sections  will  consider  the  relationships 
between  Type  2  grammars  and  programming  languages. 

3 . 2  Standard  Form  Grammars 

In  order  to  work  efficiently,  and  in  some  cases  at  all, 
some  algorithms  for  the  syntactic  analysis  of  phrase  structure 
languages  prohibit  infinite  left  recursive  productions.  A 
non-terminal  U  e  V  -  B  is  left  recursive  if  there  exists  a 
production  rule  U  *  U  x  for  some  x  ^  N.  Left  recursion 
can  be  removed  by  transforming  the  grammar  to  standard  form 
in  which  all  of  the  rules  of  R  are  of  the  form: 
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U  -»■  T  or 

U  ->  T  ...  U  ,  n  >  1 ,  where 

T  e  B  and  Ui  £  V  -  B. 

Standard  form  grammars  can  have  no  Infinite  left-going 

structures.  (Greibach  (1965),  Galler  and  Perlis  (1967)) 

Kunos ?  article  (1966)  describes  a  proof  due  to  Greibach 

which  shows  that  for  a  given  context  free  grammar  G,  a 

standard  form  grammar  G  can  be  constructed  which  generates 

the  same  language  as  generated  by  G.  However,  when  G  is 

s 

used  to  parse  a  string,  it  does  not  produce  the  same  structural 
descriptions  as  G.  The  article  also  contains  an  algorithm 
designed  by  Abbot  which  converts  a  given  context  free  grammar 
into  an  augmented  standard  form  grammar  supplemented  by 
additional  rules  describing  its  derivation  from  the  original 
context  free  grammar.  It  is  then  possible  to  correct  the 
structural  descriptions  supplied  by  the  standardized  grammar. 

Kurki  -Suonio  (1966)  has  shown  that  it  is  not  necessary 
to  transform  the  grammar  to  standard  form  if  removal  of  left 
recursion  is  sufficient.  His  system  entails  defining  new 
non-terminal  symbols  which  are  used  to  modify  the  current  rules. 

3 . 3  Bounded  Context,  Operator,  and  Precedence  Grammars 

Bounded  context  grammars,  a  subset  of  type  2  phrase 
structure  grammars,  are  grammars  which  are  restricted  so  that  the 
structure  of  a  substring  of  a  sentence  may  be  determined  by 
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considering  a  limited  portion  of  the  s^rtfstring.  For  any 
specified  bound  on  the  number  of  contextual  characters 
considered,  it  is  possible  to  determine  if  the  grammar  is 
bounded.  Bounded  context  grammars  are  free  from  syntactic 
ambiguity  and  can  form  models  for  most  languages  used  in 
computer  programming.  (Floyd  (1964a)) 

In  an  effort  to  design  an  efficient  syntax  oriented 
compiler,  Floyd  (1963)  developed  the  concepts  of  operator 
and  precedence  grammars.  These  grammars  are  subsets  of 
bounded  context  grammars. 

If  no  production  of  a  phrase  structure  grammar  P 
takes  the  form 

U  -  x  U1  U2  y, 

where  U^,  are  nonterminals,  then  P  is  an  operator 

grammar  and  L  is  an  operator  language.  In  an  operator 

P 

grammar,  there  are  three  possible  relations  (denoted  by 
=  ,  •>  and  <•)  ,  which  two  terminal  characters  T^  and 
may  take.  The  relations  are  defined  as  follows: 

1.  T1  =  T2  if  there  is  a  production  U  -*  x  T2  y 

or  U  -►  x  T1  U  T2  y. 

2.  T^  o  T2  if  there  is  a  production  U  +  x  T^  y 

and  a  derivation  U1  *  z  where  is  the  right¬ 

most  character  of  z. 


. 
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3.  o  if  there  is  a  production  U  -►  x  y 

and  a  derivation  *  z  where  is  the  left¬ 

most  terminal  character  of  z. 

One,  two,  or  all  of  the  above  relations  may  hold. 

A  precedence  grammar  is  an  operator  grammar  for  which 
no  more  than  one  of  the  above  three  relations  holds  between 
any  ordered  pair  of  terminal  symbols.  The  relations 

are  then  called  precedence  relations.  The  precedence  grammars 
form  models  of  mathematical  and  algorithmic  languages  which 
may  be  anlayzed  mechanically  by  a  simple  procedure  based  on 
a  matrix  representation  of  the  precedence  relations  between 
characters . 

Wirth  and  Weber  (1966)  point  out  that  precedence  grammars 
are  unambiguous  in  the  sense  that  the  sequence  of  syntactic 
reductions  applied  to  a  sentence  is  unique  for  every  sentence 
in  the  language.  Since  every  sentence  is  uniquely  analyzed 
and  each  rule  has  an  interpretation  rule,  the  definition  of 
meaning  is  exhaustive.  Thus,  every  sentence  has  one  and  only 
one  meaning,  a  necessity  for  programming  languages.  The 
authors  have  also  developed  an  algorithm  which  decides  whether 
a  given  grammar  is  a  precedence  grammar,  and  if  so  performs 
the  desired  transformation  into  data  representing  the  reductive 
form  of  the  grammar.  This  reductive  form  can  be  used  to 
illustrate  how  a  statement  may  be  reduced  to  the  head  of  the 


grammar . 


'  ■  “V.fl-  n  .  i  d 


17 


3 • 4  Structural  Connectedness 

The  theory  of  phrase  structure  grammars  was  developed 
as  a  means  of  studying  techniques  to  generate  and/or  recognize 
sentences  in  natural  languages.  Irons  (1964)  felt  that  the 
Type  System  developed  by  Chomsky  was  too  broad  to  be  useful 
for  classifying  the  analysis  algorithms  for  various  grammars. 

In  its  place  he  suggested  the  concept  of  "structural  connec¬ 
tion"  as  a  means  of  classifying  various  languages. 

The  classification  is  based  on  the  complexity  of  the 
interaction  between  parses  on  disjoint  substrings  of  a  parsed 
string.  The  lowest  level  of  the  classification  is  structurally 
unconnected.  The  next  level,  structurally  connected,  describes 
systems  in  which  the  symbols  surrounding  a  string  determine  its 
parse.  Finally,  structurally  connected  in  depth  refers  to  a 
system  in  which  the  parse  of  one  string  depends  on  parses  of 
other  strings. 

The  analyzers  for  various  grammars  can  be  classified 
according  to  the  class  of  grammar  which  they  will  process. 

A  recognizer  for  a  structurally  unconnected  system  is 
basically  a  table-lookup  algorithm.  Recognizers  for  struct¬ 
urally  connected  systems  are  efficient  for  a  limited  number 
of  symbols  to  the  left  or  right.  However,  if  several  sub¬ 
structures  are  present,  tentative  analysis  may  be  required. 
Systems  which  are  structurally  connected  in  depth  to  the  left 
may  require  a  dynamic  modification  of  the  grammar  specifications 
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or  a  complicated  intermediate  tabling  procedure  for  left-to- 
right  recognizers.  Recognizers  for  systems  which  are  structur¬ 
ally  connected  in  depth  to  the  right  may  be  multiple  pass  or 
may  be  non-existent. 

Symbolic  languages  which  require  assemblers  to  produce 
machine  code  are  structurally  unconnected.  The  grammars 
described  in  this  paper  are  termed  structurally  connected. 

High  level  languages  are  structurally  connected  in  depth. 
Grammars  and  analyzers  for  such  languages  are  complex  and 
the  problems  encountered  cannot  always  be  solved  by  the 
techniques  presented  here. 


. 


CHAPTER  IV 


DEFINING  THE  SYNTAX  OF  A  LANGUAGE 

4 . 1  Introduction 

The  previous  chapter  explained  that  a  grammar  is  defined 
by  a  set  of  rules  or  productions.,  One  of  the  first  steps  in 
the  construction  of  an  automatic  parsing  algorithm  is  the 
specification  of  the  syntax.  The  syntactic  specifications 
for  a  language  provide  a  set  of  definitions  for  the  various 
syntactic  units  which  can  be  used  in  the  language.  A  definition 
is  a  string  of  characters  and  syntactic  units. 

A  syntactic  specification  of  a  language  is  a  concise 
and  compact  representation  of  the  structure  of  that  language, 
but  it  is  merely  that  -  and  does  not  constitute  a  set  of  rules 
either  for  producing  allowable  strings  in  the  language  or  for 
recognizing  whether  or  not  a  proffered  string  is  in  fact  an 
allowable  string.  The  parsing  of  a  proferred  string  is  per¬ 
formed  by  syntactic  analyzer  algorithms  which  use  the  syntactic 
specifications  and  the  string  as  input  and  produce  some  repre¬ 
sentation  of  the  structure  of  the  statement,  if  possible. 
(Cheatham  (1964)) 

4 . 2  Metalanguages 

Metalanguages  are  used  to  provide  an  orderly  and  compact 
listing  of  the  rules  which  specify  the  syntax  of  the  language. 
Since  syntax  specifications  are  essential  to  parsing  algorithms. 
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the  design  of  an  algorithm  can  be  influenced  by  the  meta¬ 
language  utilized.  Gorn  (196lb)  describes  a  number  of 
techniques  for  specifying  a  grammar.  Some  of  the  methods 
are  suitable  for  the  generation  of  statements  in  the  language 
while  others  simplify  the  process  of  recognizing  legitimate 
statements  of  the  language.  The  systems  described  by  Gorn 
range  over  natual  languages  and  subsets  thereof,  logical 
expressions,  networks,  matrices,  tree  notation,  and  flow 
diagrams.  A  particular  method  is  chosen  on  the  basis  of  its 
ability  to  clarify  the  concepts  involved  and  simplify  the 
algorithms  which  will  use  the  specifications.  The  ease  of 
modifying  the  specifications  is  also  important. 

The  following  sections  contain  descriptions  of  selected 
metalanguages.  Figures  1  through  6  are  examples  of  these 
metalanguages . 

4 . 3  Backus  Normal  Form 

The  most  common  metalanguage  is  Backus  Normal  Form  (BNF) . 
It  was  developed  by  J.W.  Backus  and  first  appeared  in  the 
ALGOL  60  Report.  BNF  is  a  subset  of  natural  language.  It  is 
easily  interpreted  by  software  designers  and  can  be  coded  for 
use  by  parsing  algorithms. 

In  the  original  BNF  system,  metalinguistic  formulae  are 
constructed  from  the  following  conventions:  Metalinguistic 
variables,  whose  values  are  sequences  of  symbols,  are  repre- 
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The  marks 


sented  by  sequences  of  characters  enclosed  in  brackets 
The  marks  : :  —  (equivalent  to)  and  |  (or)  are  meta¬ 


linguistic  connectives.  In  a  formula,  any  mark  which  is  not 
a  metalinguistic  variable  or  connective,  denotes  itself  (or 
the  class  of  marks  which  are  equivalent  to  it).  Juxtaposition 
of  marks  and/or  variables  in  a  formula  signifies  juxtaposition 
of  the  sequences  denoted.  Usually  the  symbols  within  brackets 


are  chosen  to  be  words  describing  approximately  the  nature 


of  the  corresponding  variable.  The  original  version  of  BNF 
has  been  modified  to  meet  various  requirements. 

Irons  (1963b)  implemented  changes  in  BNF  to  facilitate 
its  use  in  programming  systems.  The  first  of  these  was  the 
removal  of  left  recursion  so  that  no  definitions  of  the  forms 


<A>  <A>  <B> 


or 


were  allowed.  To  offset  this  restriction,  an  "iterative 
power"  was  introduced.  Any  set  of  metalinguistic  variables 
enclosed  by  the  braces  {  }  is  specified  to  occur  zero  or 
more  times  in  an  input  string.  The  restriction  that  the 
brace  {  may  not  occur  immediately  after 
be  applied. 


must  obviously 


. 
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< 

< 


^LETTER^) 

:  :=  A  |  B  |  C  | - |Z 

<^digit) 

:  0  1 1 1 - |  9 

<^MULOP^> 

•  • _  A  1  • 

<^ADDOP^> 

:  +  |- 

<^VARIABLE^> 

:  :=  <^LETTER^>  |  VARIABLE^)  /LETTER^) 

^INTEGER/ 

:  <^DIGIT^>  |  /iNTEGER^)  <(dIGIt) 

si 

> 

o 

•-3 

O 

S 

X/ 

:  ^VARIABLE |  INTEGER^)  |  (^ARITH  EXPR 

/term^> 

:  <^FACTOR^>  |  <^TERM^>^MULOP^>  <^FACTOR^> 

ARITH  EXPR^> 

:  <^TERM^)  |  <^ARITH  EXPR^>  <^ADDOP^)  <^TERM^> 

assignment) 

:  ^ VARIABLE^  =  <^ARITH  EXPR^> 

PROGRAM^ 

:  /ASSIGNMENT^  |  ^PROGRAM ;  ^ASSIGNMENT 

Figure  1. 

ORIGINAL  BACKUS  NORMAL  FORM 


23 


< 

< 


LETTER 


<^DIGIT 
^MULOP^> 
<^ADDOP  y 

^variable\ 

^ INTEGER^ 
<^FACTORy> 

/term 


ARITH  EXPR^> 


ASSIGNMENT 


/program^) 


A  |  B  | - |  Z 

0  I - |9 


•  •—  +  - 

•  •  —  ■ 


LETTER^  {^LETTERy} 


> 


DIGIT)  {(DIGIT  )} 


(VARIABLE  >  |  <  INTEGER)  I  (^ARITH  EXPR^ 


FACTOR)  {( MULOI  FACTOR)} 


<(tERm)  {<(ADDOP><TERM^>} 


VARIABLE^)  =  /ARITH  EXPR 


:  /ASSIGNMENT^  { ;/aSSIGNMENt)} 


Figure  2 . 


IRONS’  NOTATION 
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A  subset  of  ALGOL  was  described  in  both  Backus  notation 
and  Irons’  notation.  The  two  descriptions  of  the  grammar  are 
given  in  the  Appendix  B. 

Iverson  (1964)  added  conventions  to  BNP  to  provide  a 
mode  of  description  which  is  more  compact  and  easier  to 
prepare  and  use  than  standard  BNF  descriptions.  The  new 
conventions  are: 

1.  to  number  the  syntactic  definitions  sequentially 
and  use  the  sequence  number,  rather  than  the  name, 
in  all  references.  Single  letter  mnemonics  are 
used  for  the  basic  alphabet,  i.e.  terminal  symbols. 

2.  to  use  an  asterisk  to  denote  the  syntactic  form 
being  defined.  Thus  a  recursive  definition,  in 
which  a  syntactic  unit  is  defined  in  terms  of 
itself,  involves  an  asterisk. 

3.  to  denote  any  set  of  symbols  by  enclosing  the  list 
of  symbols  in  braces  {  }  and  apply  the  set  operators 

u  (union)  and  a  (difference)  to  any  set  or  syntactic 
class . 

4.  to  list  all  synonyms  separately  and  supply  only  one 
definition . 

4 . 4  Tree  Notation  and  Related  Forms 

Tree  notation  can  be  used  to  represent  either  the  grammar 
of  a  language  or  the  structure  of  a  statement.  Trees  consist 


I 
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Reference 

Name 

Definition 

d 

digit 

CT\ 

1 

1 

1 

1 

1 - 1 

o 

a 

letter 

A  |  B  |  C - |Z 

1 

variable 

l  |  *  l 

2 

integer 

d  |  *  d 

3 

factor 

1 1  2  |  ( 5 ) 

4 

term 

3 1  *  (x* )  3 

5 

arith  expr 

i4 1  *  {  +  -}  4 

6 

assignment 

1=5 

7 

program 

6  |  *  ;  6 

FIGURE  3 
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of  terminal  and  non-terminal  nodes  which  are  connected  by 
directed  paths  called  branches. 

Trees  can  be  classified  according  to  the  direction  of 
the  connecting  branches.  A  "top  down"  tree  consists  of  a 
main  node  or  root  from  which  branches  descend  to  other  nodes. 
The  other  nodes  have  entrance  branches  but  do  not  require 
exit  nodes.  Nodes  with  no  exit  branches  are  referred  to  as 
terminal  nodes  while  nodes  having  both  entrance  and  exit 
branches  are  non-terminal  nodes.  The  information  on  a  tree 
may  be  contained  in  the  nodes,  branches,  or  both.  A  "bottom 
up"  tree  is  similar  except  the  branches  are  directed  from  the 
terminal  nodes  to  the  root. 

Taylor,  Turner  and  Waychoff  (1961)  converted  the  con¬ 
ventional  BNP  notation  into  tree  notation  or  syntactical 
charts  in  order  to  condense  the  specifications  and  simplify 
the  cross  referencing  of  definitions.  In  this  system  the 
shapes  of  the  enclosures  and  the  directions  of  the  connective 
arrows  have  special  meanings.  Charts  have  not  yet  been  used 
in  automatic  parsing  algorithms.  This  is  probably  due  to 
difficulties  in  representing  them  for  the  algorithms. 

When  trees  are  used  to  represent  a  parse  of  a  statement, 
there  is  only  one  possible  way  of  connecting  the  variables, 
i.e.  there  can  be  no  alternatives.  In  this  case  the  nodes 
themselves  can  be  used  to  represent  the  symbols  involved. 
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B 
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(l  PROGRAM  } 
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(assignment} 
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VARIABLE 

F 

2 

<3 


RITH  EXP 


\ 

1 

ARITH 

EXPR 

c 

ADDOP 
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TERM 


TERM 

D 

—  1 

4 

o  1 
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ARITH 

C 

EXPR 

4 

t 


enclose  all  terminal  characters 
Indicate  all  the  relevant  definitions 

connect  the  basic  symbols  and  metalinguistic  variables 
which  form  a  definition 

a  metalinguistic  variable  is  defined  at  this  point 

)the  number  of  other  occurrences  of  this  metalinguistic 
variable  is  given  to  the  left  of  I 


the  definition  of  the  enclosed  metalinguistic  variable 
is  given  at  the  point  of  the  coordinates  to  the  right  of 


FIGURE  4 
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A  =  B  +  C ;  C  =  D 


LETTER 


DIGIT 


FIGURE  5 


PARSE  OF  A  PROGRAM 
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Graham  (1964)  presents  a  number  of  methods  for  repre¬ 
senting  the  information  contained  in  a  tree  through  use  of 
matrices  and  linear  sequences.  The  linear  sequences  are 
related  to  Polish  Notation.  In  this  notation  the  structure 
of  the  tree  is  represented  by  the  symbols  involved,  coupled 
with  special  operators  representing  alternation,  catenation, 
and  replacement.  To  derive  the  meaning  from  such  a  linear 
sequence  it  is  necessary  to  scan  the  string  and  collect  the 
operators  and  their  operands.  (Hamblin  (1962)) 

4 . 5  Transition  Diagrams 

A  transition  diagram  is  a  network  of  nodes  and  connecting 
lines  or  paths  defining  one  or  more  syntactic  units.  Each 
transition  diagram  has  one  entrance  node  and  one  or  more  exit 
nodes.  The  connecting  lines  represent  terminal  symbols  or 
syntactic  units  or  they  may  be  blank.  No  two  paths  leading 
from  a  node  may  be  blank  or  may  have  the  same  symbol  on  them. 
Furthermore,  no  transition  diagram  may  have  a  sequence  of 
blank  paths  leading  from  the  entrance  node  to  an  exit  node  nor 
may  a  set  of  blank  paths  contain  a  loop. 

Two  restrictions  are  placed  on  transition  diagrams  to 
make  them  useful  in  translators.  The  "No  Loop  Condition"  states 
that  no  transition  diagram  will  make  reference  to  itself  without 
first  processing  a  terminal  character.  The  "No  Backup  Condition" 
requires  that  once  a  symbol  is  read,  the  syntactic  unit  of  which 
it  is  part  can  be  determined  without  looking  back  in  the  input 
string. 


. 
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Transition  diagrams  are  easily  understood  by  systems 
designers  and  readily  coded  for  use  by  a  translator.  However, 
the  design  of  diagrams  is  neither  straightforward  nor  easy  to 
describe.  (Conway  (1963)) 

4 . 6  Syntax  Specification  Difficulties 

Languages  which  can  be  readily  parsed  are  context 
independent  or  of  Type  2  and  the  grammars  for  these  languages 
can  be  specified  in  Backus  Normal  Form.  However,  in  the  con¬ 
struction  of  syntax  oriented  compilers  specification  problems 
occur  because: 

1.  programming  languages  are  not  strictly  Type  2, 

2.  semantic  clarity  and  unambiguity  are  necessary 
requirements,  and 

3.  the  analysis  algorithms  must  have  the  specifica¬ 
tions  in  the  proper  form. 

Although  the  structure  of  a  programming  language  can 
be  specified  in  BNF,  non-syntactic  rules  cannot.  One  such 
rule  is  that  a  variable  cannot  denote  two  or  more  distinct 
data  structures  in  a  program.  One  solution  to  this  problem 
involves  using  two  sets  of  specifications  -  one  supplying 
the  syntax  and  the  other  the  non-syntactic  rules. 

Some  statements  may  introduce  a  context  dependent  aspect 
into  the  analyses.  For  example,  declaration  statements  inform 
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the  analyzer  as  to  what  variables  can  appear  In  subsequent 
statements  of  the  program  and  thus  affect  the  meaning  of  the 
statements.  A  dynamic  changing  of  the  grammar  may  solve  this 
problem. 

In  order  to  produce  semantic  clarity,  it  may  be 
necessary  to  introduce  extra  syntactical  units.  This  may 
inadvertently  result  in  an  ambiguous  grammar  so  that  more 
than  one  structure,  and  hence  meaning,  may  be  assigned  to 
one  statement.  However,  no  algorithm  exists  to  determine  if 
an  arbitrary  Type  2  phrase  structure  grammar  Is  unambiguous. 

For  a  bounded  context  grammar,  procedures  exist  for  determining 
whether  or  not  the  system  is  ambiguous.  Since  it  Is  possible 
to  determine  if  a  given  grammar  is  of  bounded  context,  the 
ambiguity  problem  can  be  solved  indirectly.  (Davis  (1966), 
Caracciola  di  Forino  (1963),  Floyd  (1962)) 

An  analysis  algorithm  may  require  the  specifications 
to  be  free  of  left  recursion,  described  earlier.  Also,  problems 
may  arise  because  of  the  necessity  of  having  the  syntax 
specified  In  an  order  permitting  the  analyzer  to  parse  the 
statements  correctly.  (Metcalfe  (1964)) 

The  above  problems  necessitate  a  study  of  the  syntax 
specifications  in  relation  to  the  programming  language,  analysis 
algorithm,  and  that  portion  of  the  compiler  which  produces  the 


machine  code. 


CHAPTER  V 


SYNTAX  ORIENTED  COMPILERS 


5 . 1  Compilers 

In  a  discussion  of  the  general  properties  of  programming 
language  processors,  Davis  points  out  that  any  processor  must 

1.  linearly  scan  the  source  program  to  recognize  and 
code  the  symbols  used, 

2.  determine  the  syntactic  structures  by  isolating  all 
syntactic  types  based  on  the  syntactic  specifications 
provided, 

3.  discover  all  syntactic  ambiguities  and  violations 

of  vocabulary  or  grammatical  rules  and  act  accordingly, 

4.  translate  the  program  from  source  to  target  language 
based  on  the  structure  and  semantic  specifications 
provided , 

5.  optimize  the  target  language  for  the  particular 
machine,  and 

6.  produce  the  machine  code  program. 

The  three  classes  of  compilers  are  conventional,  syntax 
oriented,  and  list  processing  compilers. 

The  conventional  approach  has  the  language  and  machine 
specifications  programmed  into  the  compiler  algorithm.  Although 
it  appears  this  approach  produces  the  fastest  compiler  and  most 


31 


efficient  machine  code,  changes  in  the  source  language 
necessitate  changes  in  the  compiler  program. 

List  processors  utilize  rules  governing  the  form, 
content,  and  linkages  between  variable  length  pieces  of  data 
called  lists.  No  rigid  syntactic  specifications  are  used. 

A  program  in  a  list  processing  language  is  a  series  of  actions 
to  be  performed  on  lists.  Because  it  is  not  possible  to  perm¬ 
anently  allocate  data  storage,  list  processors  only  determine 
the  structure  of  the  program  and  set  up  tables  of  processing 
routines  to  handle  the  lists. 

Syntax  oriented  compilers  utilize  tables  to  supply  the 
required  information  about  the  source  and  target  languages 
in  the  compiling  process.  These  compilers  are  easy  to  write 
and  simplify  the  requirements  for  making  changes  in  the 
languages.  However,  it  is  reasonable  to  assume  that  the 
speed  of  compilation  and  efficiency  of  the  machine  code  pro¬ 
duced  are  inversely  related  to  the  number  of  restrictions 
placed  on  the  grammar. 

5 . 2  A  General  Syntax  Oriented  Compiler 

The  simplicity  and  generality  of  the  syntax  oriented 
approach  are  two  strong  reasons  for  using  it  in  the  con¬ 
struction  of  compilers  for  all  programming  languages.  A 
syntax  oriented  compiler  for  translating  programs  in  a  source 
language  L  to  machine  code  for  a  machine  M  requires  the 
following  preliminary  steps  before  becoming  operational. 
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1.  The  formal  syntax  of  L  is  loaded  into  the  system 
and  the  representation  of  the  grammar  required  by 
syntactic  analyzer  is  developed. 

2.  The  semantics  or  meaning  of  the  syntactic  structures 
of  L  in  terms  of  the  machine  M  are  loaded  and 
processed . 

When  in  operation,  the  compiler  will  translate  syntactic 
constructs  to  semantic  constructs  based  on  the  formalization 
of  syntax  and  semantics,  but  not  on  any  particular  representa¬ 
tions.  The  basic  outline  of  such  a  system  is  given  in  Figure 
7.  Only  the  part  to  the  right  of  the  double  line  is  required 
for  the  compiler.  (Feldman  (1966)) 

A  syntax  oriented  compiler  has  five  major  components: 

1.  The  loader  converts  the  source  program  to  a 
representation  used  by  the  compiler.  It  can  also 
perform  minor  functions  such  as  removal  of  comments, 
and  detection  of  simple  errors. 

2.  The  syntactic  analyzer  parses  the  program,  detects 
and  processes  errors,  and  produces  a  syntactic 
representation  of  the  program  in  tree  notation. 

3.  The  generator  is  a  set  of  tables  containing  a 
generator  strategy  to  be  applied  at  each  node  of 
the  tree.  The  generator  strategies  describe  each 
type  of  node  and  list  the  actions  to  be  taken.  The 
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actions  are  either  to  proceed  to  the  neighbouring 
node  or  produce  the  macro  instructions. 

4.  The  macro  accumulator  optimizes  the  instructions 
produced  by  the  generator. 

5.  The  code  selector  produces  the  machine  code  from 
the  optimized  instructions. 

Two  types  of  syntax  oriented  compilers  have  been 
developed.  They  are  classified  according  to  the  manner  in 
which  they  use  the  syntactic  specifications  to  derive  the 
structure  of  the  program.  The  compilers  are  referred  to  as 
either  syntax  directed  or  syntax  controlled.  (Graham  (1964)) 

5 . 3  Syntax  Controlled  Analyzers 

Syntax  controlled  analyzers  use  tabulations  of  the 
original  grammar  to  determine  the  structure  of  the  program 
being  compiled.  The  tabulations  are  produced  by  preliminary 
processing  algorithms  which  construct  matrices,  tables,  and 
lists  describing  the  permissible  constructions  of  the  grammar. 
Unlike  syntax  directed  systems,  it  may  be  impossible  to 
construct  the  original  grammar  from  these  tabulations.  Parsing 
algorithms  for  this  class  can  be  considered  as  non-predictive 
in  that  known  parameters  are  used  to  decide  what  actions  are 
to  be  taken.  Syntax  controlled  analyzers  have  been  developed 
by  Floyd  (1963)  and  Wirth  and  Weber  (1966);  Evans  (1964)  and 
Feldman  (1966);  and  Eickel,  Paul,  Bauer  and  Samelson  (1963). 


. 
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As  these  analyzers  are  applicable  to  restricted  grammars  such 
as  operator  and  precedence  grammars,  they  will  not  be  con¬ 
sidered  further  (Galler  et  al  (1967)). 

5 . 4  Syntax  Directed  Analyzers 

Syntax  directed  analysis  is  any  procedure  which  is 
capable  of  constructing  a  syntax  tree  for  an  arbitrary  pro¬ 
gram  in  an  arbitrary  phrase  structure  language.  The  syntax 
tree  is  a  structured  representation  of  the  information 
contained  in  the  source  program  and  indicates  the  relationships 
among  the  syntactic  units  formed  by  the  terminal  characters. 

In  a  compiler,  suitable  processes  translate  the  tree  into  a 
machine  code  program  or  derivation  tree  for  an  equivalent  pro¬ 
gram  in  another  language. 

In  the  production  of  the  syntax  tree,  syntax  directed 
analyzers  use  a  complicated  hierarchy  of  goals  in  an  attempt 
to  attain  a  principal  goal,  i.e.  a  program.  Two  general 
methods  are  possible.  Top-down  parsing  begins  by  looking  for 
the  principal  goal  and  then  substitutes  subordinate  goals. 

Left  recursive  definitions  may  pose  problems  for  this  approach 
as  the  possibility  of  an  infinite  number  of  subordinate  goals 
arises.  Bottom-up  parsing  begins  by  considering  the  terminal 
characters  and  attempts  to  construct  the  higher  elements.  In 
each  of  these  processes,  predictions  are  made  as  to  how  the 
program  is  constructed.  If  a  prediction  proves  false  at  some 
stage,  a  new  prediction  is  made.  A  history  of  the  successful 
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predictions  supplies  the  required  syntax  tree.  (Floyd  (1964b), 
Cheatham  and  Sattley  (1964)) 

Normally,  longer  programs  require  longer  analysis  times. 
For  syntax  directed  systems  it  is  not  known  in  general  whether 
this  increase  is  exponential  or  not.  However,  techniques 
similar  to  those  used  by  human  analysts  can  be  incorporated 
into  the  algorithms  to  reduce  the  number  of  attempts  resulting 
in  incorrect  structures.  Conway  (1963)  limits  the  choices  of 
alternatives  by  examining  the  first  character  or  word  of  the 
constructions.  Ingerman  (1966)  and  Irons  (1963b)  use  matrices 
which  indicate  if  a  particular  terminal  character  can  occur  in 
a  specified  definition.  (Floyd  (1964b)) 

Models  of  syntax  directed  compilers  and  examples  of 
parsed  statements  are  given  in  the  appendix. 

5 . 5  Advantages 

The  ease  of  compiler  construction  and  ease  of  introducing 
changes  in  the  source  language  are  the  main  reasons  for  attempt¬ 
ing  to  develop  syntax  directed  compilers.  However,  additional 
advantages  have  become  apparent  with  the  development  of  such 
compilers . 

One  benefit  arises  from  the  necessity  of  adequately 
specifying  the  syntax.  This  requirement  should  improve  pro¬ 
gramming  languages  by  removing  most  ambiguities  before  the 
language  is  released  for  general  use  and  by  simplifying 
corrections  when  they  are  required.  Furthermore,  analyzers 
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capable  of  determining  all  possible  parses  of  a  string  can 
detect  ambiguous  definitions  in  a  programming  language,  as 
two  or  more  parses  would  result.  (Irons  (1963b))  A  related 
advantage  is  that  a  complete  syntax  directed  compiler  for  a 
particular  source  language  computer  combination  would  provide 
a  rigorous  and  complete  documentation  of  the  languages  (source 
and  target)  along  with  the  method  of  translation  involved. 
(Metcalfe  (1964)) 

Graham  (1964)  contends  that  most  optimization  algorithms 
such  as  elimination  of  common  subexpressions  and  optimum 
evaluation  of  Boolean  expressions  are  much  simpler  when 
applied  to  some  intermediate  form  rather  than  to  the  original 
expression  or  the  final  machine  language  version.  Iverson 
(1962)  describes  a  process  for  reducing  Polish  notation  to  a 
form  in  which  the  number  of  operands  and  operators  is  a  minimum. 

Leavenworth  (1966)  describes  a  translation  approach 
which  allows  one  to  extend  the  syntax  and  semantics  of  a  given 
high-level  base  language  by  the  use  of  syntax  macros.  The 
syntax  macros  are  used  to  define  new  statements  and  expressions 
by  means  of  the  syntactic  units  in  the  base  language  rather 
than  machine  instructions.  A  syntax  macro  has  two  parts: 

1.  A  macro  structure  which  describes  the  syntax  of  the 
source  text  to  be  recognized; 

2.  A  macro  definition  which  describes  the  semantics  of 
the  corresponding  macro  structure  in  terms  of  the 
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base  language.  When  the  compiler  recognizes  a 
unit  defined  by  a  syntax  macro,  it  expands  the 
unit  into  the  source  code  presented  in  the  macro 
definition  for  further  translation.  Thus  the 
flexibility  of  the  base  language  can  easily  be 
increased  without  new  semantic  definitions  in 
machine  language  and  without  the  addition  of 
special  symbols. 

5 . 6  Disadvantages 

The  implementation  of  practical  syntax  oriented  compilers 
has  been  delayed  by  a  number  of  problems.  Processing  syntacti¬ 
cally  incorrect  strings  is  foremost  among  these.  The  compilers 
can  usually  detect  the  existence  of  errors  but  may  have 
difficulty  in  pinpointing  the  exact  location  and  type  of 
error.  As  a  result,  meaningful  diagnostic  routines  are 
difficult  to  construct.  Furthermore,  after  an  error  is  en¬ 
countered  it  may  be  difficult  to  correct  the  effect  of  the 
error  on  the  compiler  sufficiently  to  enable  the  system  to 
analyze  subsequent  parts  of  the  input  string.  (Irons  (1963a)) 

A  number  of  algorithms  have  been  proposed  to  solve  this 
problem.  Conway’s  transition  diagrams  utilize  a  "NO  BACKUP" 
condition  on  the  input  string.  The  programmer  then  knows  how 
far  the  compiler  was  able  to  proceed  before  failing.  Special 
"error”  syntactic  units  can  also  be  defined.  These  units 
would  be  established  as  correct  goals  in  case  of  program  errors. 
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Irons  (1963b)  developed  a  multi-parse  syntactic  analyzer 
which  preserves  all  possible  parses  for  a  program.  An  error 
condition  reduces  the  number  of  existing  parses  to  zero  at 
the  error  or  shortly  after. 

Problems  arising  from  context  dependent  features  of  a 
language  were  mentioned  in  relation  to  the  methods  of  specify¬ 
ing  the  syntax.  Declaration  statements  are  an  example  of 
context  dependent  statements.  Normally  these  do  not  require 
the  generation  of  executable  code  nor  do  they  modify  the 
structure  of  the  program.  However,  they  may  change  the  syntax 
through  coded  information  in  the  symbol  table  for  the  program 
variables.  (Cheatham  et  al  (1964)) 

Although  the  intermediate  form  produced  by  syntax  oriented 
compilers  aids  the  optimization  of  coding,  it  is  still  necessary 
to  do  a  large  amount  of  work  to  produce  highly  efficient  code. 
Automatic  parsing  algorithms  supply  the  structure  of  a  program 
but  attempts  must  be  made  to  simplify  the  description  of  the 
structure  as  an  aid  in  the  production  of  MoptimizedM  machine 
code  (Irons  (1963a),  Cheatham  et  al  (1964)). 

5 . 7  Non-Compiler  Applications 

Although  syntax  oriented  techniques  have  been  developed 
to  parse  statements  in  programming  languages,  a  number  of 
different  applications  have  been  suggested  and  experimented 
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Metcalfe  (1964)  suggested  that  requests  to  information 
storage  and  retrieval  systems  be  expressed  in  a  restricted 
form  of  a  natural  language.  A  syntax  analyzer  could  convert 
the  natural  language  to  instructions  which  the  system  would 
use  to  search  for  the  requested  information. 

Problems  of  data  input  for  digital  computers  may  also 
be  solved  by  syntax  oriented  techniques.  The  interpretation 
of  free  and  fixed  format  control  cards  is  a  possible  example. 
Raphael  (1966)  proposed  the  use  of  modified  syntax  analyzers 
as  a  means  of  translating  input  from  graphic  display  devices 
and  generating  the  necessary  machine  instructions. 

Automatic  algebraic  manipulations  may  also  be  implemented 
through  these  processes.  Schon  (1965)  developed  an  algorithm 
which  could  perform  analytic  differentiation  using  syntax 
oriented  techniques. 

Parsing  algorithms  can  be  applied  to  any  structure, 
whether  it  be  linear,  spatial,  temporal,  or  otherwise.  Kirsch 
(1964)  points  out  that  the  two  main  methods  for  recording 
information  -  pictures  and  text  -  are  closely  related.  Thus, 
computers  may  interpret  pictures  through  syntactic  descriptions 
and  provide  a  coded  text-oriented  description.  This  idea  is 
similar  to  one  proposed  by  Narasimham  (1966)  in  which  state¬ 
ments  are  used  to  describe  the  various  aspects  of  a  picture 


or  object. 


. 


41 


The  descriptions  are  built  up  through  the  use  of 
attributes  and  primitive  objects.  Primitive  objects,  the 
constituent  parts  of  the  pictures,  correspond  to  the  terminal 
characters  of  languages.  Attributes  are  the  characteristics 
which  pictures  could  have  and  define  the  non-terminal  units. 
Irons  (1963a)  proposed  combining  a  multiparse  algorithm  with 
a  pattern  recognition  device  to  offer  several  interpretations 
of  a  picture  and  a  weight  for  each.  These  techniques  could  be 
applied  to  the  analysis  of  bubble  chamber  pictures,  letter 
recognition  problems,  etc. 
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CHAPTER  VI 


THREE  SYNTAX  DIRECTED  ANALYZERS 

The  remainder  of  this  thesis  is  concerned  with  a 
description  of  three  algorithms  for  parsing  strings  in 
relation  to  a  set  of  syntactic  specifications.  These 
algorithms  can  be  used  as  the  second  major  component  of 
syntax  directed  compilers  or  as  the  basis  of  systems  sug- 
guested  in  Section  5.7,  Non-Compiler  Applications. 

Although  Backus  Normal  Form  (or  a  modification  thereof) 
is  easily  understood  by  compiler  designers,  it  is  of  little 
direct  value  to  an  automatic  analyzer.  Thus,  an  essential 
step  in  the  construction  of  a  parsing  algorithm  is  preprocess¬ 
ing  the  BNF. 

Since  computer  algorithms  operate  most  efficiently 
with  numbers,  the  prose  descriptions  of  BNF  are  converted 
to  numerical  codes.  A  number  of  utility  routines  based  on 
a  paper  by  Williams  (1959)  facilitate  this  conversion  process. 
These  numerical  codes  are  processed  by  algorithms  which  pro¬ 
duce  tables  to  be  used  by  the  syntax  directed  analyzers. 

The  analyzers  given  here  include  a  conventional  analyzer, 
a  multiple  parse  analyzer  and  a  transition  diagram  analyzer. 

The  descriptions  and  listings  of  the  programs  along  with 
sample  runs  are  given  in  the  appendices. 
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6 . 1  A  Conventional  Analysis  Algorithm 

One  of  the  first  syntax  directed  compilers  was  developed 
by  Irons  (1961).  His  parsing  algorithm  required  the  syntax 
to  be  specified  in  semi-linked  lists.  To  increase  processing 
speed,  he  developed  an  acceptance  matrix  which  indicated  if  a 
particular  metacomponent  could  be  developed  as  a  first  element 
in  the  definition  of  a  metaresult.  Cheatham  and  Sattley  (1964) 
developed  a  related  system  which  is  capable  of  processing  left 
recursive  definitions.  However,  this  system  does  not  use  an 
acceptance  matrix.  Ingermann,  in  his  book,  "A  Syntax-Oriented 
Translator”  describes  a  similar  analyzer  and  provides  an 
algorithm  for  developing  an  acceptance  matrix.  The  algorithm 
presented  here  is  Cheatham’s  system  modified  to  take  advantage 
of  an  acceptance  matrix. 

6 . 2  A  Transition  Diagram  Algorithm 

In  an  effort  to  overcome  problems  of  compilation  speed 
and  error  detection,  Conway  (1963)  developed  a  translation 
system  using  transition  diagrams.  The  structure  of  a  program 
is  obtained  by  starting  at  the  entrance  node  of  a  main  diagram 
(i.e.  PROGRAM)  and  recording  which  lines  and  diagrams  are 
traversed  while  attempting  to  reach  its  exit  node  by  matching 
characters  in  the  input  string  with  terminal  characters  in  the 
diagrams.  This  algorithm  is  of  particular  interest  because  it 
is  being  used  in  the  APL  interpreter. 
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6 . 3  A  Multiple  Parse  Algorithm 

Irons  (1963b)  also  developed  a  multiple  parse  syntactic 
analyzer  to  improve  error  detection  In  syntax  directed  com¬ 
pilers.  The  preprocessing  phase  determines  the  terminal 
characters,  the  syntactic  units  these  characters  can  initialize, 
and  the  units  which  must  be  present  to  complete  the  initialized 
units.  The  analyzer  operates  by  considering  each  input  symbol 
and  determining  all  feasible  parses  of  the  string  up  to  that 
point . 

The  three  systems  for  which  models  are  constructed  have 
been  chosen  because  of  their  generality,  type  of  syntax 
specifications  used,  and  explicit  representation  of  the  parses 
provided.  All  methods  use  the  most  general  form  of  phrase 
structure  grammars.  This  reduces  the  number  of  restrictions 
which  must  be  observed  in  specifying  the  syntax  of  a  language 
and  still  permits  the  algorithms  to  be  applied  to  more  restricted 
grammars.  The  preprocessing  algorithms  convert  syntax  specifica¬ 
tions  in  BNF  or  Irons’  notation  to  vectors  and  matrices  which 
are  closely  related  to  the  original  form.  Consequently  it  is 
easier  to  understand  the  operation  of  the  algorithms  and  develop 
the  models. 

The  conventional  and  transition  diagram  algorithms  produce 
the  first  correct  parse  which  results  from  the  particular  order 
in  which  the  grammar  is  specified.  The  multiple  parse  algorithm 
produces  all  correct  parses.  The  output  of  each  system  can  be 
converted  to  tree  notation,  thus  simplifying  interpretation  of 


the  parses. 


' 


CONCLUSION 


All  of  the  models  determined  the  structure  of  simple 
arithmetic  statements.  The  conventional  and  multiple  parse 
algorithms  also  analyzed  statements  formed  in  a  subset  of 
ALGOL.  The  development  and  testing  of  the  models  produced 
general  results  concerning  their  construction,  syntax  specifica¬ 
tion  requirements,  error  detection  facilities,  efficiency,  and 
usage.  The  work  also  served  as  one  evaluation  of  APL  for  model 
building. 

The  preprocessing  routines  for  the  conventional  and 
multiple  parse  algorithms  are  straightforward.  It  was  impos¬ 
sible  to  write  algorithms  which  would  construct  transition 
diagrams  for  the  subset  of  ALGOL.  The  matrices  produced  by 
the  multiple  parse  preprocessing  routines  appear  to  contain  the 
necessary  information  and  the  problem  entails  designing  diagrams 
which  satisfy  the  "No  Loop"  and  "No  Backup"  conditions.  It  is 
possible  to  construct  the  diagrams  manually  although  this  was 
not  done  for  the  ALGOL  example.  The  analysis  routines  require 
careful  consideration  in  their  construction  as  minor  changes 
can  have  considerable  effect  on  the  routines  themselves  as 
well  as  the  preprocessing  routines. 

The  algorithms  are  of  most  value  if  the  syntax  is 
specified  in  such  a  manner  that  the  system  designer  can  under¬ 
stand  it  and  the  algorithms  can  process  it.  Although  both  BNF 
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and  Irons’  notation  are  readily  understood  by  designers.  Irons' 
notation  is  more  useful  for  syntax  directed  systems.  In  the 
examples  considered  it  is  at  least  as  powerful  as  BNF  and  is  a 
more  concise  representation  of  the  syntax.  Although  the  conven¬ 
tional  routines  used  BNF  they  would  have  been  simplified  by 
the  use  of  Irons’  notation;  BNF  would  not  simplify  the  other 
algorithms . 

A  syntax  directed  analyzer  must  determine  the  structure 
of  statements  efficiently  in  terms  of  speed  and  storage  require¬ 
ments.  No  tests  were  conducted  to  determine  the  quantitative 
aspects  of  speed  and  storage  requirements,  but  qualitative 
judgements  can  be  made.  The  transition  diagram  algorithm 
would  be  the  fastest  as  it  always  determines  part  of  the 
structure  with  each  recognition  of  a  terminal  character.  The 
conventional  algorithm  is  slower  as  it  may  have  to  reconsider 
input  characters  if  a  false  structure  is  attempted.  The  multiple 
parse  algorithm  is  slowest  as  it  must  consider  all  parses.  The 
transition  diagram  algorithm  is  most  efficient  because  the 
structural  connections  in  the  language  are  determined  in  the 
preprocessing  phase  and  not  in  the  analyzer  algorithm. 

The  transition  diagram  algorithm  requires  the  least 
storage  followed  by  the  conventional  and  multiple  parse  systems. 
The  transition  diagram  algorithm  requires  the  syntax  specifica¬ 
tions  and  one  working  vector.  Information  concerning  the  parses 
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need  not  be  stored  in  the  machine  as  there  Is  no  possibility 
of  false  structures.  The  conventional  routines  require  the 
syntax  specifications  and  three  working  vectors.  Information 
related  to  the  parses  may  have  to  be  stored  because  of  false 
structures.  The  multiple  parse  algorithm  requires  extensive 
tables  describing  the  syntax  specifications  and  storage  for 
parses  n  -  1  and  n  if  character  n  is  being  considered. 

In  both  the  conventional  and  multiple  parse  algorithms,  auxiliary 
storage  could  be  used  to  retain  the  parse  information.  However, 
this  would  decrease  the  speed  of  the  algorithms. 

All  programs  which  are  subjected  to  analysis  are  not 
correct  and  the  algorithms  must  be  able  to  locate  the  errors. 

In  this  respect  the  algorithms  are  essentially  equivalent.  The 
transition  diagram  and  multiple  parse  algorithms  indicate  which 
input  symbol  they  were  considering  when  they  were  unable  to 
produce  further  output.  The  conventional  algorithm  indicates 
the  character  farthest  along  the  input  string  which  it  had 
processed  before  coming  to  an  error.  No  attempts  were  made 
to  have  the  algorithms  process  the  input  string  beyond  the 
error  condition. 

The  algorithms  could  be  used  in  any  situation  requiring 
the  determination  of  structure  providing  a  suitable  means  of 
coding  the  input  could  be  devised.  The  particular  algorithm 
used  would  depend  on  the  requirements  of  the  problem.  The 
transition  diagram  system  offers  the  best  syntax  directed 
analyzer.  Until  algorithms  which  will  produce  the  diagrams 
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from  the  syntax  specifications  are  available,  much  of  the 
flexibility  of  such  systems  will  be  mssing.  The  development 
of  these  preprocessing  routines  would  constitute  a  significant 
achievement  in  this  field. 

The  construction  of  models  using  APL  in  a  timesharing 
environment  produced  better  returns  than  could  be  expected  from 
conventional,  batch  processed  programming  languages.  The 
powerful  instruction  set  reduced  the  amount  of  coding  required 
and  thus  reduced  the  number  of  mechanical  errors.  The  time¬ 
sharing  environment  permitted  rapid  correction  of  the  errors 
which  did  occur.  APL  would  be  more  useful  for  such  work  if 
it  had  an  iterative  instruction,  facilities  to  insert  comments, 
and  a  better  arranged  listing  of  the  programs.  These  features 
would  make  it  easier  to  study  and  understand  the  models. 

Syntax  directed  analysis  started  as  a  means  to  analyze 
the  structure  of  statements  in  a  programming  language.  Before 
syntax  directed  compilers  will  compete  with  conventional 
compilers  many  problems  must  be  overcome.  Consequently,  the 
first  large  scale  use  of  syntax  directed  analysis  may  be  in 
the  solution  of  non-compiler  problems.  If  there  is  a  demand 
for  programming  languages  with  complex  grammars,  syntax 
directed  analyzers  can  provide  the  basis  for  simple,  flexible 
compilers . 


BIBLIOGRAPHY 


Caracciola  di  Forino,  A.,  1963,  "Some  remarks  on  the  syntax 

of  symbolic  programming  languages".  Comm.  ACM, 
6/8,  pp .  456-460. 

Cheatham,  T.E.  Jr.  and  Sattley,  K. ,  1964,  "Syntax-directed 

compiling",  AFIPS  Conf .  Proc , ,  vol.  25,  pp .  31-57 

Chomsky,  N. ,  1959,  "On  certain  formal  properties  of  grammars". 

Information  and  Control,  vol.  2,  pp.  137-167. 

Conway,  M.E.,  1963,  "Design  of  a  separable  transition-diagram 

compiler".  Comm.  ACM,  6/7,  pp .  396-408. 

Davis,  R.M. ,  1966,  "Programming  language  processors". 

Advances  in  Computers ,  ed.  Alt,  F.L.,  Academic 
Press,  New  York:  vol.  7,  pp .  117-180. 

Eickel,  J.,  Paul,  M. ,  Bauer,  F.L.,  and  Samelson,  K. ,  1963, 

"A  syntax  controlled  generator  of  formal 
language  processors".,  Comm.  ACM,  6/8,  pp .  451-455 

Evans,  A.,  1964,  "An  ALGOL-60  compiler",  Ann.  Rev,  in  Auto. 

Prog. ,  vol.  4,  pp.  87-123. 

Feldman,  J.A. ,  1966,  "A  formal  semantics  for  computer  languages 

and  its  applications  in  a  compiler-compiler". 
Comm.  ACM,  9/1,  pp .  3-12. 


48 


Floyd,  R.W. ,  1962,  "On  ambiguity  in  phrase  structure 


languages".  Comm.  ACM,  5/10,  pp .  526-534. 

Floyd ,  R. W. , 

1963,  "Syntactic  analysis  and  operator  pre¬ 
cedence",  JACM,  10/3,  pp .  316-333. 

Floyd,  R.W. , 

1964a,  "Bounded  context  syntactic  analysis", 

Comm.  ACM,  7/2,  pp .  62-65. 

Floyd,  R.W. , 

1964b,  "The  syntax  of  programming  languages 

-  a  survey" ,  IEEE  Transactions  on  Electronic 

Computers,  vol.  EC-13,  pp .  346-353. 

Galler,  B.A.  and  Perlis,  A.J.,  1967,  MA  proposal  for 


definitions  in  ALGOL",  Comm.  ACM,  10/4, 

pp.  204-219. 

Gorn ,  S . , 

196la,  "Some  basic  terminology  connected  with 

mechanical  languages  and  their  processors", 

Comm.  ACM,  4/8,  pp .  336-339. 

Gorn ,  S . , 

1961b,  "Specification  languages  for  mechanical 

languages  and  their  processors.  -A  baker’s  dozen", 

Comm.  ACM,  4/9,  pp .  532-542. 

Graham,  R.M. ,  1964,  "Bounded  context  translation",  AFIPS 

Conf .  Proc . ,  vol.  25,  pp .  17-29. 


■ 


49 


Greibach,  S 


Hamblin,  C. 


Ingerman,  P 


Irons,  E.T. 


Irons,  E.T. 


Irons,  E.T. 


Irons,  E.T. 


Iverson,  K. 


,  1965 ,  MA  new  normal-form  theorem  for  context- 

free  phrase  structure  grammars”,  JACM ,  vol.  12, 
pp .  42-52. 

.,  1962,  "Translation  to  and  from  polish  notation", 
Comput .  J . ,  vol.  5,  pp .  210-213. 

Z.,  1966,  A  Syntax-Oriented  Translator,  Academic 
Press,  New  York:  131  pp . 

1961,  "A  syntax  directed  compiler  for  ALGOL 
60",  Comm.  ACM,  4/1,  pp .  51-55. 

1963a,  "The  structure  and  use  of  the  syntax 
directed  compiler",  Ann.  Rev,  in  Auto.  Prog. , 
ed.  Goodman,  R. ,  MacMillan  Comp.,  New  York: 
vol.  3,  pp .  207-228. 

1963b,  "An  error  correcting  parse  algorithm". 

Comm.  ACM,  6/11,  pp .  669-67^. 

1964,  "Structural  connections  in  formal  languages", 
Comm.  ACM,  7/2,  pp .  67-72. 

. ,  1962,  A  Programming  Language,  John  Wiley  and 
Sons,  Inc.,  New  York:  286  pp . 


Iverson,  K.E.,  1964,  "A  method  of  syntax  specification". 

Comm.  ACM,  7/10,  pp .  588-589. 


. 

' 


50 


Kirsch,  R.A.,  1964,  ’’Computer  Interpretation  of  English 

text  and  picture  patterns”,  Trans  IEEE, 
vol.  EC-13,  pp  *  363-376. 

Kuno,  S.,  1966,  ’’The  augmented  predictive  analyzer  for 

context-free  languages  -  Its  relative  efficiency", 
Comm.  ACM,  9/11,  pp .  810-823. 

Kurki-Suonio ,  R. ,  1966,  ”On  top-to-bottom  recognition  and 

left  recursion”.  Comm.  ACM,  9/7,  pp .  527-528. 

Landweber,  P.W.,  1964,  "Decision  problems  of  phrase-structure 

grammars",  IEEE  Trans . ,  vol.  EC-13,  PP •  354-362. 

Leavenworth,  B.M.,  1966,  "Syntax  macros  and  extended  transla¬ 
tion",  Comm.  ACM,  9/11,  pp .  790-793* 

Metcalfe,  H.H. ,  1964,  "A  parameterized  compiler",  Ann.  Rev,  in 

Auto .  Prog . ,  ed.  Goodman,  R. ,  MacMillan  Comp., 

New  York:  vol.  4,  pp .  125-167* 

Narasimham,  R. ,  1966,  "Syntax-directed  interpretation  of  classes 

of  pictures".  Comm.  ACM,  9/3,  PP *  166-173* 

\ 

Schorr,  H. ,  1965,  "Analytic  differentiation  using  a  syntax- 

directed  computer".  Comp .  J . ,  vol.  7,  pp *  290-298. 

Taylor,  W. ,  Turner,  L. ,  and  Waychoff,  R. ,  1961,  "A  syntactical 


chart  of  ALGOL  60",  Comm.  ACM,  4/5,  pp .  393-394. 


. 


51 


Williams,  P.A.,  1959,  "Handling  identifiers  as  internal  symbols 

in  language  processors",  Comm.  ACM,  2/6,  pp .  21-24. 


Wirth,  N. , 


and  Weber, 
ALGOL, 
Part  1 


H. ,  1966,  "EULER:  A  generalization  of 
and  its  formal  definitions".  Comm.  ACM, 
9/1,  pp.  13-25,  Part  2  9/2,  pp .  89-99. 


APPENDIX  A 


DESCRIPTIONS  AND  LISTING  OF  ROUTINES 

Figure  7  indicates  which  subroutines  are  required  by 
other  functions. 

CONVENTIONAL  ROUTINES 

ANALYZE  PROG 

This  algorithm  determines  the  structure  of  an  input 
string  PROG  by  using  the  vectors  set  up  by  CHEATMOD. 

CHEATMOD  RULE 

This  algorithm  develops  a  set  of  interrelated  vectors 
from  the  numeric  representation  of  the  syntax  specifications 
RULE.  These  must  be  in  BNF. 

Syntax  Type  Table  -  one  entry  for  each  syntactic  unit. 
INDEX  -  the  numerical  representation  of  the  syntactic 
type . 

TERM  -  =  0  if  the  unit  is  a  non-terminal  character. 

=  1  if  the  unit  is  a  terminal  character. 

LKFR  -  if  TERM  =  0  points  to  row  in  structure  table 
where  definition  of  this  unit  began* 


if  TERM 


1  same  value  as  INDEX. 
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Syntax  Structure  Table  -  one  entry  for  each  constituent 

of  a  definition. 

TYCD  -  numeric  value  of  element  in  definition. 

STRC  -  =  0  definition  can  not  be  considered  complete. 

=  1  definition  can  be  considered  complete. 

SUCC  -  points  to  row  of  structure  table  which  can  come 
next . 

ALTR  =  0  no  alternate  to  this  unit 

=  1  alternate  possible  but  not  necessary 

>  0  row  of  structure  table  which  may  replace 
this  one. 

CHEATOUT 

This  function  displays  the  vectors  prepared  by  CHEATMOD , 

INGMOD 

This  function  prepares  vectors  ROW  and  COL  and  an  array 
MATRIX  which  indicates  what  terminal  symbols  given  in  ROW 
can  occur  in  the  metaresults  given  in  COL .  The  definitions 
are  provided  by  the  matrix  RULE  which  is  the  numeric  repre¬ 
sentation  of  the  syntax  specifications  in  BNF. 

INGMODOUT  MATRIX 

Outputs  the  vectors  ROW  and  COL  and  the  array  MATRIX. 


. 
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lANALYZElUlV 

V  ANALYZE  PROG 

[1]  GSTACK*\  0 

[2]  SS2V4(7K«-iO 

[3]  CSTA  CK* 1  0 

[4]  PR 0 G T*I DEN TPR 0 GR A M  PROG ,'  ?  * 

[5]  GOAL* 21 

[6]  LSTCHAR* 1 

[7]  SOURCE* 0 

[8]  CHAR* 1 

[9]  ^M2 

[10]  AN Al: GO AL*INDEX\TYCDl SOURCE! 

[11]  ANA2 :*(TERMl GOAL! =1) /ANAS 

[  12  ]  ->(^r/?IX[/?C)F/[  PROGTl  CHAR ]  ]  ;  COL[I//Z)E’J[  6£ML]  ]  ]  =  0  )  A4tfi46 

[13]  LOAD  'G' 

[14]  LOAD  'S' 

[15]  LOAD  ' C' 

[16]  SOURCE*LKFRlGOAL! 

[17]  +ANA1 

[18]  ANAS :+(LKFRlGOAL!*PROGTlCHAR! )/ANA 6 

[19]  (  ( 1  +  p  GSTACK)  ;  ’  *  »  (  (  *  »  ) /!T«-  ,  CODEl  CHAR  ;  ]  )  ,  ’  TERMINAL  ') 

[20]  CHAR*CHAR+ 1 

[21] 

[22]  +ANA 10 

[23]  4/!M6:+(S6>t//?(7£  =  0)/4/!M15 

[24]  *(ALTRlSOURCE!  =  1)/MM2 

[25]  ^(^L7,/?[50f//?C£,]^0)/yl^8 

[26]  UNLOAD  ' G ' 

[27]  UNLOAD  'S' 

[28]  UNLOAD  ' C ' 

[29]  -*,4/M6 

[30]  ANA8  :SOURCE*ALTRlSOURCE! 

[31]  -+ANA1 

[32]  ANA10  :+(STR Cl  SOURCE! =0) /ANA13 
[3  3]  ANA  1 1  :+(£[/£<?[  S<9t//?<7£]=0  )/ANA12 

[34]  4/IM13  :S£tf/?C'£,+S,yC<7[S££//?C,E] 

[35]  +ANA 1 

[36]  4  JIM  12  '.UNLOAD  '  G' 

[37]  UNLOAD  'S' 

[38]  CSTACK 

[39]  (1  +  p GSTACK  \ '  ' , ( T* '  ' ) /T*  t EXTERN ALl IN DE Xl GOAL! ; ] ) 

[40]  -+(SOURCE*0  ) /ANA10 

[41]  -+ 0 

[42]  AN Alb : ( 'ERROR  ' \LSTCHAR) 

V 


V  CHE  A TMOD CD] V 


V  CHE A TMOD  RULE 


[ 1 ]  TERM+LKFR+INDEX + i 0 

[2]  TYCD+-S TR C+-SUCC+-ALTR+-LFRC*-  \  0 

[3]  O'  BEGIN  SWEEP  THRU  RULES' 

[4]  I«-0 

[5]  NST+ 1 

[6]  C^' CHECK  FOR  UNPROCESSED  LEFT  RECURSION ' 

[7]  CHTM1  :  ->(  (  p LFRC)  *0  )  / CHTM3 

[8]  +( (pRULE)lll<I+I+l)/CHTM15 

[9]  LFT+RULEII-,  2] 

[10]  FSTCM+NST 

[11]  C<-  ’  SET  UP  TYPE  TABLE  ' 

[12]  LKFR+LKFR ,NST 

[13]  TERM*- TERM,  0 

[14]  INDEX+-INDEX  ,LFT 

[15]  LSTCM+-  i  0 

[16]  C<-'  PREPROCESS  THIS  RULE' 

[17]  LFR+-0 

[18]  FylXX+-+/Ft/XF[I;l  +  iFFXF[J;l]-l]e  1  2 

[19]  J+3 

[20]  CHTM2  0  :  -*(  F£/XF[  I  ;  1  ]  <X^X+ 1  )  /  CHTM2 1 

[21]  -*(Ff/LF[I  ;J-]=2)/Cff!ZW2  0 

[22]  -►(  ~(  2 >RULElI ; X-  1  ]  ) *LFT=RULE[I  ;X]  )  / CHTM 2  0 

[23]  XFP^X 

[24]  +CHTM2Q 

[25]  CHTM 2  1  :  NRALT+-NALT- LFR*0 

[26]  C^' PROCESS  RULE  I' 

[27]  X«-3 

[28]  NPALT+ 0 

[29]  CHTM2  °.-+(RULE[_I  ;  X+-X+ 1  ]  =  2  ) /CHTM2 

[30]  C^' CHECK  FOR  LEFT  RECURSION' 

[31]  -y(J=LFR)  /CHTM  4 

[32]  O’  PREPROCESS  THIS  COMPONENT' 

[3  3]  OWP«-F£/XF[ I  ;J] 

[34]  TYCD+TYCD ,C0MP 

[35]  0 

[36]  NST+-NST+1 

[37]  C+' CHECK  FOR  LAST  COMPONENT ' 

[38]  +(  (  NEXT+-RULEI I ;  X+ 1  ]  )  <2  )/CHTM3 

[39]  C^' NOT  LAST  COMPONENT ' 

[4  0]  STRC+STRC ,  0 

[41]  SUCC+SUCC ,NST 
[4  2]  -y  CHTM  2 

[43]  C+'LAST  COMPONENT ' 

[44]  CHTM 3 : LSTCM+LS TCM ,NST- 1 

[45]  FP^LTW/PAXX+l 


. 
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[46]  STRC+STRC,  1 

[47]  SUCC+SUCC ,0 

[48]  C+- '  CHECK  FOR  LAST  ALTERNATIVE' 

[49]  +{NEXT=0) /CHTM1 

[5  0]  C*-' CHECK  FOR  LAST  NONRECURSIVE  ALTERNATE' 

[51]  +(NPALT=NRALT) /CHTM2 

[52]  FSTCM+ALTRIFSTCM~\+-NST 

[53]  +CHTM 2 

[54]  C+' DETERMINE  EXTENT  OF  LEFT  RECURSIVE  DEFN  ’ 

[55]  CHTMH : LFRC+LFRC ,J 

[56]  CRTM1 :  ■*(  (  T-  0  )  ,  (  T=  2  )  ,2 <T+RULEZI  ;J+J+ 1]  )  / CHTM8  ,  CHTM2  ,  CHTMb 

[57]  C^' PROCESS  LEFT  RECURSION ' 

[58]  CHTM8 : K+l 

[59]  RANDAN ST 

[60]  CHTM9  :K+-K+1 

[61]  TYCD+-TYCD,  TEMP+RULElI  iLFRClKll 

[62]  ALTR+ALTR, 0 

[63]  NST+NST+ 1 

[64]  C^' CHECK  FOR  LAST  COMPONENT ' 

[6  5]  +{K=pLFRC) /CRTM11 

[66]  SUCC+SUCC ,NST 

[67]  STRC+STRC ,0 

[68]  ALTR+ALTR , 0 

[69]  -+CHTM9 

[70]  C+'LAST  COMPONENT ' 

[71]  CHTM11 :SUCC*-SUCCtHAND 

[72]  STRC+STRC,  1 

[7  3]  StfC,C,[LSTCM]<H7i4ffD 

[74]  j4L27?[tfi4ffZ?]«-  1 

[75]  LFRC+- 1  0 

[76]  +CHTM1 

[77]  C^' COMPLETE  TYPE  TABLE ’ 

[78]  CHTM1  5  : T-t-4 

[79]  CHTM1 7  :-*■(  (  p  EXTERN  AL  )l  ll  <I+-I +  1  )/ CHTMlb 

[80]  ->(l€J^£,X)/CH!TM17 

[81]  INDEX+INOEX ,1 

[82]  TERM*- TERM,  1 

[83]  LKFR+LKFR ,T 

[84]  -+CRTM11 

[85]  CHTM1 6 : ' FINI ' 


VCHEATOUnniV 


V  CHE AT OUT 

[1]  CONTROL+U 

[2]  I+CONTROLl 2]-l 

[3]  ->(C0NTR0Llll=2) /CHT02 

[4]  STOP+1 /CONTROLS 3] ,pTERM 

[5]  '  SYNTAX  TYPE  TABLE ' 

[6]  ’  ’ 

[7]  ' NUM  TYPE  INDEX  TERM  LKFR ' 

[8]  CHT01  °.->(STOP<I<-!+l  ) /O 

[9]  ( X  ;  '  ’  J]  ;  ]  s  (  3p  ’  *  )  ;  INDEXl  I  ]  ,  X£7?M[  J 

]  ,  XZFi?[  J]  ) 

[10]  +CHTO 1 

[11]  CHT02:ST0P+l / CONTROL [ 3] ,pTYCD 

[12]  *  SYNTAX  STRUCTURE  TABLE' 

[13]  ’  » 

[14]  'INDEX  TYCD  STRC  SUCC  ALTR ' 

[15]  CHT03  :+(STOP<I+I+l  )/0 

[16]  (  ’  *  ;  (I,2,7CD[J]  .STRClIl  ,SUCClIl  ,ALTRlI ]  )  ) 

[17]  -+CRTO  3 

V 


VI//GM<9ZW!Z,[D]V 
V  INGMODOUT  MATRIX 

[I]  'COLUMNS' 

[  2  ]  I-<-0 

[3]  ING02i+(  (pMATRIX)L  2]<I«-I  +  1  MINGO  1 

[4]  ( X ;  ’  ’  ,FXX£7?/1MX[  COL\I;  ]  ) 

[5]  +ING02 

[6]  INGOliI+O 

[7] 

[8]  'ROWS' 

[9]  IN  GO  3  :  (  p  AM  77?  J  J)[1]<J^J  +  1  )/INGO  4 

[10]  TEMP*-  (  ( J -ROW )  /  i  pi?CW )  ,  i  0 

[II]  e7«-0 

[12]  XFGO  5  :  ->(  (  p  TEMP )  <  J^X+ 1  )  /Xtf  GO  3 

[13]  ( I 'AT*'  '  )  /  T+EXTERNALl  TEMPI  J]  ;]) 

[14]  +INGOS 

[15]  INGO^  :  NR*-(  pMA  TRIX )  [  1  ]  + 1 

[16]  ff<7«-(ptfi427?X*)[2] 

[17]  C«-0 

[18]  ’ MATRIX ' 

[19]  $(  (tfC+l  )  tNR)p(  (  l)  +  iiVi?)>^(OT,^)p(i^),  MATRIX 


V 


. 


. 
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VIPJGMODLUIV 
V  INGMOD 

Cl]  ROW+COL+(  NT+(  ^EXTERNAL)  [  1]  )pO 

[2]  NR+NC+ 0 

[3]  COM P+0 

[4]  COLlTl+xNC+pT+RULEl ;2] 

[5]  /?£[/[  !T]«-i  W?«-p2’«-4  +  1^7-4 

[6]  MATRIX+(NR , NO) pO 

[7]  I«-0 

[8]  77£’5T^-i4 

[9]  INGM1  :+(  (  pRULE)  [1]  <I«-I+1)  /INGM6 

[10]  J«-3 

[11]  INGM 2  :-Ki?£/L£[I  ;«W+1  ]  =  0  )  /INGM1 

[12]  +  (P£/L£[X;^]€mST)/I/l/GM2 

[13]  M4XPIJ[P0J/[P[/L2?[I;e7]]  ;  COXCtftfLtfC  J  ;  2  ]  ]  >1 

[14]  +INGM 2 

[15]  C+' COMPRESS  IDENTICAL  ROWS' 

[16]  INGM6 : 1+0 

[17]  INGM7 :+(NR<I+I+l) /INGM8 

[18]  </<-! 

[19]  INGM9  :  ■+(  NR  <J+J+1  )  /INGM1 

[20]  +(  v  //^  27?I  J[  J  ;  ]  *MA  TRIXL  J  ;  ]  )  /INGM 9 

[21]  P0J/[  (J=i?0f/)/ipi?CW]+-I 

[22]  -+INGM9 

[23]  INGM 8:1+0 

[24]  T+L/ROW 

[25]  TEMP+ \ 0 

[26]  INGM10  :J+(  (  ( S«*L  /  (  R0W>  0  )  /P0V )  =R0W )  /  i  pi?<9f/ )  ,  iO 

[27]  TEMP+TEMP .MATRIXlROWlJl 1] ] ; ] 

[28]  i?CW[<7>I«-I-l 

[29]  MS*T)  /INGM10 

[30]  NR+( -I ) 

[31]  ROW+ ( -ROW ) 

[32]  MA TRIX+ (NR, NC) p  TO/P 

[33]  -*•(  COMP=  1  )  / 0 

[34]  C+'FILL  IN  CARRY  OVERS' 

[35]  J+0 

[36]  INGM1 2 :+(NC<J+J+l )/IN GM11 

[37]  1+0 

[ 38]  INGM A  3 : +(NR<I+I  +  1 ) /INGM12 

[39]  +(~MATRIXII  W]  )/INGM13 

[40]  TEMP+ROWLCOL\Jl 

[41]  +( TEMP -I ) /INGM 1 3 

[42]  MA TRIXL I ;1+MATRIXII ; IvMATRI XL  TEMP ; ] 

[43]  +INGM 13 

[44]  INGMll:TERMA+(~( \NT)e( i4) .RULE [ ;2 1)/\NT 

[45]  TEMP+ ( i NR)e ROWL TERM A ] 

[46]  MA TRIX+ TEMP /[ 1]  MATRIX 

[47]  ROW+( TEMP / ( i NR ) ) \ROW 

[48]  NR+ ( pMA TRI X ) [ 1 ] 

[49]  ROWL  (~(  \NT)  eTERMA)  / \NT~\  +  0 

[50]  COMP+ 1 

[51]  +INGM6 


V 


' 
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TRANSITION  DIAGRAM  ROUTINES 

TABLE  DIAGRAM  PROG 

This  algorithm  determines  the  structure  of  the  input 
string  PROG  by  using  the  diagrams  supplied  in  TABLE . 

TABLE  is  an  N  *  6  element  matrix  where  N  is  the 
number  of  lines  in  the  transition  diagrams.  The  columns  of 
TABLE  contain  the  following  information: 

1.  The  numerical  representation  of  the  syntactic 
unit  being  defined. 

2.  The  initial  node  of  the  i^h  connecting  line. 

3.  The  end  node  of  the  i^  connecting  line. 

4.  =  1  -  i^  line  represents  syntactic  unit. 

=  0  ith  line  represents  terminal  character. 

5.  >0  numerical  representation  of  syntactic  unit 

or  terminal  character. 

=  0  open  path. 

6.  =0  non-exit  node. 

=  1  exit  node. 

<  0  multiple  exit  nodes. 

CMP  DIAGALTERNATE  INITNODE 

A  recursive  function  which  prepares  the  diagrams  for 
DIAGRAMAUTO .  The  recursive  feature  is  used  when  two  or  more 
definitions  exist  for  one  syntactic  unit. 
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TABLE  «-  DIAGRAMAUTO 

This  function  prepares  a  transition  diagram  for  each 
rule  from  a  set  of  syntax  specifications  in  Irons’  Notation. 

TABLEO  <-  DIAGRAMCOR  TABLE 

This  function  is  used  to  replace  or  insert  rows  into 

TABLE . 

TABLEO  <-  DIAGRAMMOD  TABLE 

This  algorithm  takes  a  TABLE  prepared  by  DIAGRAMREAD 
or  DIAGRAMAUTO  and  crossreferences  the  individual  diagrams 
as  well  as  converting  the  linkages  in  the  individual  diagrams 
to  linkages  in  terms  of  the  entire  set  of  diagrams. 

DIAGRAMOUT  TABLE 

This  function  lists  the  transition  diagrams  contained 
in  TABLE . 

TABLE  <-  DIAGRAMREAD 

This  routine  will  read  and  store  the  numeric  representation 
of  transition  diagrams  which  have  been  prepared  manually. 


. 
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HDIAGAL TERN A  TEL D] V 

V  CMP  DIAGALTERNATE  INITNODE 

[  1]  -*(  2eTEMP<-RULEL  IRL;  (CMP<TEMP) /TEMP+\  (  pRULE)L  2]]  )  /DIAGL1 

[2]  DIAGL2  i+LRULEL  IRL;  CMP']  e  0  2  )  /DIAGLQ 

[3]  +(RULEl IRL;CMP]=^) /DIAGLS 

[4]  INODE+INITNODE 

[5]  FNODE+NODE+ 1 

[6]  +(RULELIRL;CMP1*3) /DIAGLQ 

[7]  LOOP+NODE 

[8]  -+DIAGL5 

[9]  DIAGLQ :+(RULEL IT?  L  ;  CWP+ 1  ]  *  4 ) /DIAGL6 

[10]  NODE+LOOP- 1 

[11]  FNODE+-LOOP 

[12]  PZT4GL6  :AMMF«-Z?Z/LF[ IFLjCMP] 

[13]  SYNUIND+-NAME  eRULEL  ;  2  ] 

[14]  EXIT+0 

[15]  -*•(  — i?£/Z/£’[  TFL  ;  CAfP+l  ]  €  0  2)/DIAGLl 

[16]  EXIT+ 1 

[17]  P  Ji4  GL  7  :  LI  STYLIST,  SYNU ,  ItfOPF ,  FFPPF ,  SYNUIND  ,  NAME , 

[18]  INITNODE+NODE+NODE+l 

[19]  DIAGL5  :  CMP+-CMP+1 

[20]  -+DIAGL2 

[21]  DIAGL1 : CMPT+1+ CMP+TEMP \ 2 

[22]  CMPT  DIAGALTERNATE  INITNODE 

[23]  +DIAGL2 

[24]  DIAGLQ i LI STl p LI STl  =1) /0 

[25]  LI ST+LI ST , SYNU , LOOP ,  (LOO P+1 ) ,  0  0  1 

V 


VPIT467?,4M[[]]  V 
V  IV1PLF  PITIGF^M  PROG;PT 

[1]  C+' INITIALIZE  STACK  AND  CONVERT  PROGRAM' 

[2]  PRO GT+I DEN TPRO GRAM  PROG,'  ?  ' 

[3]  STACKS 2  3,  lO 

[4]  FL/4S<-P7^0 

[5]  C^' ADVANCE  POINTER  FOR  NEW  SYMBOL' 

[6]  DIAGST:INSYM+-PROGTlPT*-PT+l] 

[7]  DI AGSU  :  STCKTO P«- VALUE  STACK 

[8]  C*-' CHECK  FOR  SYMBOL  DETERMINATION' 

[9]  +(TABLELSTCKTOP ;4]=0) /DIAGSY 

[10]  C+'ADD  NEW  UNIT  TO  STACK' 

[11]  STACK+STACK  ,TAB LEl STCKTOP ;  5] 

[12]  -+DIAGSU 

[13]  C*-' CHECK  FOR  OPEN  PATH' 

[14]  DIAGSY :+( TABLEL STCKTOP ; 5 ] = 0 ) /DIAGTRE 

[15]  C^' CHECK  FOR  SYMBOL  MATCH' 


. 
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[16]  +( INSYM=TABLEl STCKTOP; 5] ) /DIAGTRS 

[17]  C+-  '  CHECK  FOR  ALTERNATE  PATH  ' 

[18]  DIA GNN : TEMP*-( TABLEl STCKTOP ; 1 ] =  3MPZ,Z?[ ;  1  ]  )  /  i  (  p  TVISLF  )  [  1 

] 

[19]  (  TEMP*-  { STCKTOP <  TEMP  )  /  TEMP  )  ;  2]  i7\4BLE[ 
STCKTOP; 2] 

[20]  -+( ALTN  >  p  TEMP )  /DIAGFL 

[21]  pt4ca[ppt>ica>tpmpul2W] 

[22]  -*-Z?Ii4GS£/ 

[2  3]  C+'UNIT  PATH  TRAVERSED ' 

[24]  Z?I;4G27?tf  sFLdG+o 

[25]  +DIAGCHK 

[26]  C^' EMPTY  PATH  TRAVERSED' 

[27]  PI4CT.PF:FL,4C«-0 

[28]  -+DIAGCHK 

[29]  O' SYMBOL  PATH  TRAVERSED 1 

[30]  DIAGTRS: ( ( 1+pSTACK) ; ’  ’ , ( ( T* '  ' ) /T+CODEl PT ; ] ) , ’  TERM 

INAL  '  ) 

[31]  FLAC«-1 

[32]  C^' CHECK  FOR  DIAGRAM  EXIT ' 

[33]  DIAGCHK  :  ->(  T4Z?LF[  STCKTOP ;  6  ]  *  0  )  /DIAGEX 

[34]  ST4CK[pST4CK]«-T4BLF[ STCKTOP;  3] 

[35]  d>* DETERMINE  RESTART  POINT ' 

[36]  +  (PL4C,~FL4C)/PI4CST,PI,4CS£/ 

[37]  C«- '  DIAGRAM  HAS  BEEN  TRAVERSED ' 

[38]  DIAGEX  :  (  (  pSTACK)  ;  1  '  ,  (  T*  '  '  )  /T«-  ,KKTK/?ilML[  T4BLF[ 

STCKTOP  ;1]  ;  ]  ) 

[39]  C^' CHECK  FOR  MULTIPLE  EXITS' 

[4  0]  +(  77/lSLP[5!Z,C'A!rOP;  6]<0)  /DIAGMP 

[41]  PIACCS:ST4CK-e£/FST,4CK  ST4CK 

[42]  IKS7M«-Pi?0GT[PT«-PT+FL;4G] 

[4  3]  C*-' CHECK  FOR  PROGRAM  OR  ERROR' 

[44]  +( ( (pSTACK)=0) APT=pPROGT) /DIAGPR 

[45]  -►(  (  pST^CK)  =0) /DIAGERR 

[46]  STCKTOP+VALUE  STACK 

[47]  +DIAGTRU 

[48]  C*-'  RESET  STACK  BECAUSE  OF  MULTI  EXITS' 

[49]  DIAGMP:  STACKS  1  +  p  S  TA  CK  ]  +S  TA  CK  [  l  +  pST4CK]  +  (  1 )+  |  TABLE 
l STCKTOP ; 6  ] 

[50]  -+DIAGUS 

[51]  C+'NO  ALTERNATE  PATHS' 

[52]  DIAGFL:  ST ACK+UN STACK  STACK 

[53]  +  (  (PST>lCK)=O)/0Ii4CK/?/? 

[54]  STCKTOP*- VALUE  STACK 

[55]  +DIAGNN 

[56]  C+- '  PROGRAM  IS  SYNTACTICALLY  CORRECT' 

[57]  P_L4CPF:-*0 

[58]  C*-'  PROGRAM  HAS  ERROR  ' 

[59]  DIA GERR : ' ERROR ' 

[60]  PROG 

[61]  (((PT+ l)p’ 

[62]  +  0 
V 


. 

.  V  • 

. 
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VDIAGRAMAUTO  [Q]V 

V  TABLET-DIAGRAM  AUTO 

[1]  TAB  LEt- \0 

[2]  IRL+-  0 

[  3]  DIAGA1  :-*(  (  pRULE)ll]<IRL+IRL+l)  /DIAGA  2 
[  4  ]  BY/17P«-BPLP[  IRL  ;  2  ] 

[5]  CMP«-4 

[6]  INITNODE<-NODEt-  1 

[7]  LISTt-xO 

[8]  CMP  PIACALTBPPATP  WCPP 

[9]  T ABLEST ABLE .LIST 

[10]  +DIAGA 1 

[11]  DIAGA  2  :  TABLED  (  (  (  pTABLP)  t6  )  ,  6  )pTABLP 

V 


VPIAC£AMC0PP[[]]V 

V  TAB LEOt-DIAGRAMCORR  TABLE  ;ROW ;  CORR  ;ROWA  ;ROWB  ;TAB ;NT 

[1]  DIAC2  iROWt-U 

[2]  -*(  0=ROW) /DIAC3 

[3]  CORR+-  [ 

[4]  ->(  (  LROW)<ROW) /DIAC1 

[5]  TABLElROW ;l+CORR 

[6]  +DIAC 2 

[7]  PIAC1 :ROWB+( pTABLE)l ll -ROWA+lROW 

[8]  NTt-pTABt-  ,  TABLE 

[9]  TABLED ( ( ( p  TAB ) v6 ) ,  6)pTAB+( ( NTa6*ROWA ) / TAB ) .CORR, (NTu 
6*ROWB) /TAB 

[10]  -+DIAC  2 

[11]  DIAC3  :  TAB LEOt-TAB LE 

V 

VPJACPAMMCP[D]V 

V  TABLEOt-DIAGRAMMOD  TABLE  \I  . 

[1]  TEMP+TABLEt  ; 4 ] \ TABLEl ;l]i TABLEl ; 4 ] /TABLED  ;5] 

[2]  TABLEl  ;5>(  (~ TABLEl  ;4]  )*TABLEl  ;5]  )+TEMP 

[3]  It-  0 

[4]  PIAMPl:-*(  (  p  TABLE  )  [  1  ]  <It-I  + 1  )  /DIAMD2 

[5]  -*(TABLElI  ;6l*0)  /DIAMD1 

[6]  TEMPt-(  (  TABLEII;  3  ]  -TABLEl  ;  2]  )  aTABLP[I  ;  1]  =TABLEl  ;  1]  ) 

[7]  TABLEII;  3>(  TEMP  /  \  Ip  TABLE)  l  1]  )[1] 

[8]  -+DIAMD 1 

[9]  DIAMD2iTABLEOt-TABLE 


V 


. 

1DIA GRAMO UTlUIV 


V  DIAGRAMOUT  TABLE ;T 

[  1]  6)(  7,  T)  p  (  \T*-(  pTABLE) 111  )  , 

V 


V  ;LIST 

[1]  TABLE*-  i.O 

[2]  LIST*-bp  0 

[3]  DIAflZn  :LI5^D 

[4]  -►(  (LIST,  i0)[l]=  999  )  /DIARD2 

[5]  T ABLEST ABLE , LIST 

[6]  +DIARD1 

[7]  DIARD2  TAB LE*~ (  (  (  pTJlBLff )  *6  )  ,  6  )pT^3L£’ 


V 
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MULTIPLE  PARSE  ROUTINES 


CHAINMATRIX 

This  algorithm  prepares  vectors  which  indicate  the 
syntactic  units  that  are  the  initial  components  in  the 
definitions  of  other  syntactic  units.  For  each  definition 
DEFIR  -  contains  the  numeric  representation  of  the 
unit  being  defined. 

INITL  -  contains  the  numeric  representation  of  the 
initial  term  in  the  definition. 

ROWR  -  indicates  the  row  index  of  RULES  for  this 
definition . 

COLR  -  indicates  the  column  index  of  RULES  for 
the  initial  element  in  this  definition. 

J  COPYPARSE  J 

This  function  copies  row  I  of  PNAME  for  columns  1  to 
J  into  the  first  free  row  of  PNAME .  Similarily  for  PSYNP. 

The  syntax  pointer  in  PSYNP  is  also  reset. 

OUTPUTPARSE  I 

Displays  the  rows  of  PNAME  and  PSYNP . 

MULTICHAIN 

This  routine  prepares  the  vectors  SNAME  and  SSUCC  and 
the  arrays  CNAME  and  CSUCC .  The  syntactic  specifications  are 
contained  in  the  numeric  array  RULES. 


. 
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CNAME  -  indicates  the  chain  of  initial  constituents 
of  definitions. 

CSUCC  -  points  to  an  element  in  SNAME  which  must  follow 
in  order  to  complete  this  definition. 

SNAME  -  indicates  elements  which  must  follow  other 
elements . 

SSUCC ^  -  points  to  an  element  in  SNAME  which  must  follow 
SNAME ± . 

=  0  No  following  element  possible. 

SYNTAX  MULTIPARSE  PROG 

This  algorithm  determines  all  possible  structures  for 
the  string  of  symbols  contained  in  PROG .  The  main  arrays  are 
PNAME  -  gives  the  numerical  representation  of  each 
parse . 

PSYNP  -  indicates  the  syntactic  unit  which  must  follow 
the  corresponding  element  in  PNAME  in  order  to 
extend  that  particular  parse.  The  element  of 
largest  index  is  called  the  syntax  pointer. 

IN  SPLITCHAIN  NEXT 

This  recursive  function  processes  syntactic  units  which 
can  lead  to  two  or  more  syntactic  units. 

OUTPUTCHAIN 

This  function  displays  the  vector  INITL ,  DEFIN ,  ROWR , 


and  COLR. 


■ 
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ICHAINMATRIXlUIV 

V  CHAINMATRIX 

[1]  INITL+DEFIN+ROWR+COLR+\0 

[2]  NRLS+(pRULES)lll 

[3]  1+0 

[4]  CHAN  1  :  ■+(  NRLS <1+1  +  1  ) /CHAN2 

[5]  J«-3 

[6]  CHAN  3 :+(RULESL I;J+J+lh =0)/CHANl 

[7]  +{RULESlhJ-ll>2) /CHAN  3 

[8]  ItfITL+TtfirL.fltfLFSCIjen 

[9]  DEFIN+DEFIN  tRULES\_I\2] 

[10]  ROWR+ROWR ,1 

[11]  COLR+COLR ,J 

[12]  -+CHAN3 

[13]  CHAN 2 :  ' FIN I ' 

V 


VCOPYPylPSP[[]]V 

V  J  COPYPARSE  J 

[1]  C+'COPY  PARSE  I PR  FROM  OUTSIDE  BRACKET  TO  BRACKET  JB 
R  INTO  NEXT  POSITION' 

[2]  P«-0 

[3]  C’<9PY2  :+(</<£/«-£/+ 1  )  /COPY1 

[4]  P/MMP[PT;P>P/1MMP[I;P] 

[5]  P5y/\7P[/\/T;P>PPYtfP[I;£/] 

[6]  +COPY2 

[7]  C+' RESET  SYNTAX  POINTER  FOR  INSIDE  BRACKET ' 

[8]  COPY  1  :P5Y^P[P7,;e/]^5,SPC,C[  |  PSTtfP[  NT  ;J’]  ] 

[9]  PARSEINT  ;2]+PARSEl I ;2] 

[10]  ->0 

V 


V0PTPt/!raL4IW[[]]V 
V  OUTPUTCHAIN 

[1]  »  IN IT  DEFN  ROW  COL ' 

[2]  <$(  S,pINITL)p  (  ipItfITL)  tINITL,  DEFI N  tROWR  ,  COLR 


V 


VMULTICHAINlDlV 


V  MULTICHAIN 

Cl]  C^' INITIAL  TERMINAL  CHARACTERS ' 

[2]  CHNRL+-(  p RULES  )  p  0 

[3]  NUMIT+xpINITL 

[4]  NRL+ 0 

[ 5]  INITERM+i-INITLeDEFIN) /NUMIT 

[6]  CNAME+CSUCC+-  1  0  pO 

[7]  SNAME+SSUCC+\0 

[8]  NUMC+- 1  0 

[9]  £«-0 

[10]  KCH+-0 

[11]  MPLC1  :-►(  (pIWim?M)<£«-/s:  +  i  )  /MULC2 

[12]  KCH+KCH+ 1 

[13]  CNAME<-(  TEMP+KCHaKCH-l )  \  CNAME 

[14]  CSUCC+TEMP\CSUCC 

[15]  NUMC+NUMC, 0 

[16]  CW4MET  1  ;£<?#  >7/177^1^1  ] 

[17]  CP£/CC[1  ;£CP>0 

[18]  7£MZ^7/l/7TPPM[7n 

[19]  1  SPLIT CHAIN  LEAD 

[20]  +MULC 1 

[21]  MULC2 : 1 FINI 1 

V 


V0i/7PP7P4PPP[[]]V 
V  OUTPUTPARSE  I 

[1]  +(~OUTPUT) / 0 

[2]  '  ’ 

[ 3]  ( 'PARSE  ' ; I) 

[4]  TEMP+(  0*,PNAMEII  ;]  )/i  (pPMMP)[2] 

[5]  .PflMMFCljTPMP] 

[6]  ,PSYNPtI  ;  7PMP] 


V 
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VMULTIPARSElUlU 
V  SYNTAX  MULTIPARSE  PROG 

[1]  OUTPUT* 0 

[2]  PRO GT*I DEN TPRO GRAM  PROG,1  1 

[ 3 ]  NCR A  IN* ( p  CNA ME) l 2 ] 

[4]  C* ’ SET  UP  INITIAL  PARSES ’ 

[5]  PT* 1 

[6]  IWP«-(  (  (C'^MP^PPOGTCPT]  )=CMff[l;  ]  )  /\N  CHAIN)  ,  iO 

[7]  NPR*pINP 

[8]  PNAME*PSYNP*(  NPR  ,  (  [  /JWAfCClffP]  )  )pO 

[9]  NPARSE*NPRp 0 

[10]  P/U?SP«-$(  2,ffPP)p(tfPi?pO)  ,  iWPP 

[11]  I«-0 

[12]  MLTP1  :  -*(  NPR  <1*1  +  1 )  /MLTP2 

[13]  WP4PSP[I>m^<7[IWP[  J]] 

[14]  PMMP[I;S^(  T+l)-iT]+CJIMMP[  (  i  T*NPARSE[  I  ]  )  ;IffP[  J]] 

[15]  PPJ^P[J;(S']^,C,5PPC[  iTjUfKl]] 

[16]  OUTPUTPARSE  I 

[17]  (P^PPP[J;];»  *  1 ; PNAMElI ; i NPARSEl J] ] ) 

[18]  ->MLTP1 

[19]  MLTP2 : NT*NPR+1 

[20]  -*(  (pPROGT)<PT*PT+l  )/0 

[21]  IffP«-(  (  (P/MP^PPCXmPT]  )=C^Ml;]  )/\N CHAIN)  ,  iO 

[22]  PCHAIN*pINP 

[23]  C*' CONSIDER  EACH  POSSIBLE  PARSE' 

[24]  JPP-j-0 

[25]  MLTP3 : *( NPR  <IPR*IPR+1 ) /MLTPh 

[26]  -KW^CpPflMMPKl]  )/MLTP6 

[27]  TEMPA*NTa( pPNAME)l 1] 

[28]  PNAME*TEMPA \ [ 1 ] PNAME 

[29]  PS  YNP*TEMPA \ [ 1 ] PSYNP 

[30]  NPARSE*TEMPA \ NPARSE 

[31]  PAR  SE*TEMPA\\_1~\P ARSE 

[32]  C*' CONSIDER  EACH  BRACKET  FROM  THE  INNERMOST ' 

[33]  ML  TP 6  i  JBR*NPARSEl  IPR  ] 

[34]  ML  TPS  :  -*(  1>JBR*JBR-1)  /MLTP3 

[35]  C* ’ GET  LINK  FOR  THIS  PARSE -BRACKET ' 

[36]  LINK*PSYNPIIPR ;JBR1 

[37]  -*(  LINK-0  )  /MLTPS 

[38]  C*' BRANCH  IF  DIRECT  MATCH ' 

[39]  +(SNAMEl \ LINK] =CHAR ) /MLTP1 1 

[40]  +MLTP1 

[41]  MLTPlliIPR  COPYPARSE  JBR 

[42]  PNAMEINT; JBR+1]*CHAR 

[43]  PSYNPlNT;JBR+ 1]*1 

[44]  NPARSELNT]*+/0*PNAMElNT 

[45]  OUTPUTPARSE  NT 


■ 


[46]  NT+NT+1 

[47]  -+MLTP5 

[48]  C*-  ’  CHECK  FOR  POSSIBLE  INDIRECT  EXTENSION' 

[49]  MLTP7  iKCH*-0 

[50]  MLTPQ :+(PCHAIN<KCH*-KCH+l) /MLTPb 

[51]  +(~SNAMEZ  |  LINK]eCNAMEl  ;INPlKCH]]  )  /MLTPQ 

[52]  I PR  COPYPARSE  JBR 

[53]  TEMP*- JBR 

[54]  EXT+CNAMEl  ;INPlKCH]  ]  i SNAMEl  I  LINK'] 

[55]  ML  TP  9 : TEMP *- TEMP + 1 

[56]  +(TEMP<(pPNAME)l2]) /MLTP30 

[57]  TEMPA*-TEMPa(pPNAME)l2] 

[58]  PNAME+-TEMPA  \PNAME 
[  59  ]  PSYNP*-TEMPA\PSYNP 

[60]  ML  TP  30:  P  NAM  El  NT ;  TEMP  ]  *-  C  NAM  El  EXT ;INPlKCH]  ] 

[61]  PSYNPl  NT ;  TEMP]  *-CSUCCl  EXT  \INPlKCH]  ] 

[62]  +(EXT=1) /MLTP10 

[63]  EXT*-EXT- 1 

[64]  -+MLTP9 

[65]  MLTP10  iNPARSElNT]*-+/0*PNAMElNTi  ] 

[66]  OUTPUTPARSE  NT 

[67]  NT*-NT+1 

[68]  -*•(  NT<  (  p  PNAME )  [  1  ]  )  /MLTPQ 

[69]  TEMPA*-NTa(  p PNAME)  l  1] 

[7  0]  PflMME^TPMP^U  l]PilMAfff 

[71]  PSYNP+TEMPA \[ 1 ]PSYNP 

[72]  NPARSE*-TEMPA\NPARSE 
[7  3]  PARSE*-TEMPA\l  1]  PARSE 

[74]  -+MLTP  8 

[75]  C*-  *  ADJUST  PARSE  TABLE ' 

[76]  MLTP>4  TEMP*- 1  WPP 

[77]  P/IMMPC  TPMP  ;  ]«-P5  JWP[  TEWP  ;  >0 

[78]  Pj4P£,£,[™fP;]«-0 

[79]  WPAPS'E’[!Z7£’MP]^0 

[80]  NT*-NT-1 

[81]  NNP*-NT~NPR 

[8  2]  +(WWP=0)  /MLTPHO 

[83]  J«-0 

[84]  ’  ’ 

[85]  L«-0 

[  86]  ML  TP  2  0  :  -*(  NNP<I*-I  + 1  )  /MLTP2  5 
[8  7]  TEMP*- 1 +  NPR 

[8  8]  +(  a/P^MP[TPMP;]=0)/MLTP20 

[89]  L*-L  +  l 

[9  0]  WP4PSP[L>WP4PS,P[!Z™P] 

[91]  Pi4P5P[  L  ;  1  ]  +-P4PSPC  TPMP  ;  2  ] 

[92]  P4P5P[L;2]«-L 

[9  3]  PW4MP[L;  >PW4MP[:rPMP;] 


’ 
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[94]  PSYNPLL; 1+PSYNPLTEMP ;] 

[9  5]  {PAR  SELL i ] ;  5  *  ’  ; PNAMEL L ; i NPARSEL I] ] ) 

[96]  J+TEMP- 1 

[97]  ML  TP  21  :  -►(  NT<J<-J  + 1  )  /METP20 

[9  8]  ->(~a/(  (  9 PNAMEL  L  ;  ]  )  =  ,  PNAMEL  J  ;  ]  )  ,  (  ,PSYNPLL;  ]  )=tPSYNPLJ 

l ] ) /MLTP21 

[99]  PNAMElJi l+PSYNPLJ; >0 

[100]  PARSELJ  ;  ]«-0 

[101]  NPARSEL  I>0 

[102]  -+MLTP2 1 

[103]  ML  TP  2  5  :  NPR+-L 

[104]  +MLTP2 

[105]  MLTPHO : ( 'ERROR  ' ;PT) 

V 


VSPLITC7L4I/\/[n]V 
V  IP  SPLIICIMIP  NEXT  \  J 

[1]  NEXT+NEXT, iO 

[2]  -*(  0=pNEXT) /O 

[3]  IP^-IP+1 

[4]  +(7P<(  pCIMMP)[  1]  )  /SPEC  A 

[  5  ]  CNAME+  (  TEMP+IN a  (  p  CflMMP  )  [  1  ]  )  \  [  1  ]  CWAMP 

[6]  £SP<7(7^TPMP\[ 

[7]  SPLCh  :  «7«-0 

[8]  5PIC1  :->(  {pNEXT)  <J+J  +  1  )/0 

[9]  ->(  1=  1 )  / SPLC3 

[10]  IN+STRIN 

[11]  KCH+KCH+ 1 

[12]  CNAME+ ( TEMPER CHaKCH - 1 ) \ CP4MP 

[13]  CSUCC+TEMP\CSUCC 

[14]  NUMC+-NUMC  9  0 

[15]  CIMMP[  7PMP;ZCP]^C7\MMP[  ( IPMP^-iIP-1  )  ;P(7P-1] 

[16]  CSPCCT  IPMP ;  PCP]  ^C5PCC,[  7PMP  ;KCH-1  ] 

[17]  SPEC  3  i  CN  AMELIN  ;KCHh+DEFI  NL  NEXTL  7  ]  ] 

[18]  IP0P^P^f/P[PPJI[«7]  ] 

[  19]  IC£L^l  +  <?tfLP[PPmi]] 

[20]  +(  (™P^CMi?L[I^f/;iraL]  )=0)/5PIC15 

[21]  CSUCCLlN;KCFn+TEMP 

[22]  +SPLC1 

[23]  S’PICl  5  :  T£WP<-i?PLPS[I/?<7J/;I(m] 

[24]  •>(  TEMPe  0  2)  /SPECS 

[25]  ->  (  TEMP  -3)  / SPLC3 

[26]  C5P(7C,[IP;PC,P]^PPI^PPL  +  1 

[27]  CHNRLLlROWiICOLl+NRL 

[28]  , TEMP 

[29]  -+SPEC1 2 

[30]  SPECS  :  CSUCC  LlNiKCFH  +  O 


. 
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[31]  -+SPLC7 

[  32]  SPLC  6  :  CSUCCZIN iKCH]+-NRL+NRL+l 

[33]  CHNRLIIROW1ICOLI+-NRL 

[34]  TEMPER ULESlIROW  \I COL+-I COL  +  1  ] 

[35]  S NAME+S NAME , TEMP 

[36]  BRKET+-NRL 

[37]  SPLC 8  :-K  ( rPMP«-i?PL£S[  I7?0J/  ;  ICOL+-ICOL  + 1  ]  )=4)/STLC9 

[38]  SSUCC+SSUCCtNRL+NRL+ 1 

[39]  ,  2TOP 

[40]  +SPLCB 

[41]  SPLC9  :  SSU CC+-SSU CC  ,  -BRKET 

[42]  SPLC12 iTEMP+RULESlIROW iICOL+ICOL+1] 

[43]  -*(  TEMPe  0  2  )  /  SPLC20 

[44]  -►(2TOP=3)/SPLC,11 

[45]  SSUCC+SSUCC  tNRL<-NRL+ 1 

[46]  ,  TPMP 

[47]  -+SPLC1 2 

[48]  SPLC1 1  :  BRKET+I COL+1 

[49]  SPLCl*ii+((TEMP+RULESlIROWiICOL<-ICOL+l]  )=4)  /SPLC13 
[5  0]  SSUCC+SSUCC  ,NRL<-NRL+ 1 

[51]  SNAME+SNAME , TEMP 

[52]  +PPLC14 

[5  3]  SPLC1 3 : SSU  CC+SSUCC , -BRKET 
[54]  -+SPLC1 2 

[5  5]  SPLC20  1  ->(  (  pSNAME )  =pSSUCC  )  /SPLC1 

[56]  SSUCC+SSUCC ,0 

[57]  SPLC7:NEST+( ( ] =INITL) /NUMIT) , 1 0 

[58]  -*(  0<pNEST)  /SPLC 2 

[59]  WPMC,[£C'P>IW 

[60]  SPLC2:IN  SPLITCHAIN  NEST 

[61]  STRIN+IN 

[62]  +SPLC1 
V 
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UTILITY  ROUTINES 

IDENTBASE 

Initializes  the  structures  EXTERNAL ,  PRIMARY ,  AND 
OVERFLOW  and  inserts  the  symbols  : | ,  :[,  and  ]:  into 
these  tables. 

NUMBER  IDENTCOMP  WORD 

Converts  the  six  alphanumeric  characters  in  the  vector 
WORD  to  a  single  code  number,  NUMBER ,  in  base  200. 

TEXTA  +  IDENTCORR  TEXT 

Provides  the  means  to  correct  the  alphabetic  vector 
TEXT  which  has  been  read  by  IDENTTEXT .  The  corrected  text 
is  stored  in  TEXTA . 

IDENTEXT  LIST 

Calculates  the  identification  numbers  for  and  classifies 
n  six-character  alphabetic  words  contained  in  the  n  x  6 
matrix  LIST .  The  results  are  stored  in  EXTERNAL ,  PRIMARY , 
and  OVERFLOW, 

NUMCODE  «-  IDENTFIND  WORD 

Determines  the  identification  numbers  NUMCODE  for  n 
six-character  alphanumeric  words  contained  in  the  n  *  6 
matrix  WORD.  The  numbers  depend  on  the  words  in  the  matrix 
EXTERNAL .  If  a  word  is  not  in  EXTERNAL,  it  is  added. 


. 
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VALUE  «-  IDENTIFY  WORD 

Determines  the  identification  number  VALUE  for  the  six 
alphabetic  characters  in  the  vector  WORD.  The  same  procedure 
is  used  as  in  IDENTFIND . 

LINE  +  IDENTLINE  SYNTEST 

Provides  successive  parts  of  the  alphabetic  vector 
SYNTEST  with  each  call.  The  parts  are  separated  by  the 
carriage  return  symbol  and  are  returned  in  the  vector  LINE. 
RESTART  must  be  set  to  0  for  the  first  call. 

IDENTOUTPUT  TEXT 

Prints  the  vector  TEXT  stored  by  IDENTTEXT  with  the 
line  number  and  the  cumulative  number  of  characters  to  that 
line . 

NUMBER  4-  IDENTPROGRAM  PROG 

Converts  a  string  of  words,  separated  by  blanks, 
contained  in  the  vector  PROG  to  numbers  and  returns  the 
results  in  NUMBER.  If  a  symbol  is  not  contained  in  EXTERNAL , 
a  number  determined  by  IDENTSYMBOLS  is  used. 

IDENTSYMBOLS 

Assigns  numberic  codes  to  the  terminal  characters' 
contained  in  INPSYM  so  as  to  specify  the  same  code  for 
equivalent  symbols.  The  codes  are  contained  in  INPCODE . 
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RULE  «-  IDENTSYNTAX  SYNTAX 

Converts  the  alphabetic  representation  SYNTAX  of  a 
set  of  n  rules  of  maximum  length  m  to  an  n  x  (m+2) 
numeric  matrix  rule  containing  the  identification  numbers 
of  the  symbols  in  SYNTAX . 

SYNTEXT  +■  IDENTTEXT 

Reads  and  stores  a  vector  SYNTEXT  of  alphabetic 
characters  and  separates  the  lines  with  a  carriage  return 
symbol.  The  routine  stops  when  a  fAf  is  entered  as  the 
first  character  in  a  line. 

COVE  OPERATOR  OPERS 

Places  a  vector  OPERS  of  n  words  separated  by  blanks 
into  an  n  x  6  matrix  COVE .  Each  word  is  left  justified 
in  the  row  of  COVE . 

COVE  <-  OPERATORM  OPERS 

Places  a  vector  OPERS  of  n  symbols  into  an  n  x  6 
matrix  COVE  with  the  symbols  in  the  first  column. 

OPEXTERNAL  Outputs  EXTERNAL  table 

OPPRIMARY  Outputs  PRIMARY  table 

Outputs  OVERFLOW  table 


OPOVERFLOW 
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LOAD  T 

This  function  adds  an  element  to  one  of  three  stacks 
depending  on  the  letter  supplied  by  T. 

UNLOAD  T 

This  function  stores  the  value  of  the  top  element  on 
one  of  three  stacks,  then  removes  this  element  from  the 
stack . 

STACKO  +  UNSTACK  STACK 

This  function  produces  a  vector  STACKO  which  is 
equivalent  to  the  vector  STACK  with  the  last  element  removed. 

TOP  VALUE  STACK 

This  function  stores  the  value  of  the  last  element 
of  the  vector  STACK  in  the  variable  TOP . 
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UIDENTBASE [H]V 

V  IDENTBASE 

[1]  EXTERN  AL*-OVERF  LOW*- lO 

[2]  PRIMARY*-  20  0  2  pO 

[3]  IDENTEXT  OPERATOR  |  :[  ]:  * 

V 


VIDENTCOMP [D]V 


V  NUMBER*-!  DENT  COMP  WORD  ;  ALPHA  ;  VALUE  ;  TEMP 


[1]  ALPHA*- '  12  345  6  7  89  0/lFPPFFGFIJFLMyC>P<3FFT£/mfYZ 

VA+-Xv * 

[  2  ]  ALPHA*- ALPHA  ,  '  Zwep  ~  +  4 \  o*^-->af  L_VA°[][(])c=>nu±T| 

[3]  IML£/F<-0,  i"l  +  p>lLPP/l 

[  4  ]  TEMP*-  (SpVALUElAL  PH  A  i  WORD  ],6p0)[7-i6] 

[5]  NUMBER*- 1  +  200  |  f  1  Oi  (  6a  3  ) /TEMP )  +1 0±(  6a>3  ) /TEMP 

V 


<<=>>* 
.  \  /  :  1 


0  IDENTIFY [□] V 

V  74LPF«-TZ?E7I/2TFY  J/OPP 

[1]  raLPF+IPFtfTFIPPC 1  6  )  p  WORD  ,  6  p  1  ' 

V 


vipff:fliff[[]]v 

V  LIN EVIDENT LINE  SYNTEST 

[1]  LINE*-  \  0 

[2]  ^FFFITylFTVIPTPl 

[3]  SYNTEXT*-S  YNTEST 

[4]  RESTART* 1 

[5]  CR*-' 

i 

[6]  IDTL1 :+( (pSYNTEXT)=0 )/0 

[7]  TEMP*-  (  ( pSYNTEXT)aSYNTEXT\CR ) 

[8]  LINE+TEMP /SYNTEXT 

[9]  LIFFCpPIPF]*-'  ’ 

[10]  SYNTEXT*(~TEMP) /SYNTEXT 


V 


NIDENTCORRlUlV 


V  TEXT  A+-I  DENT  CORR  TEXT 

[  1  ]  CR+- ' 

[2]  NUMBER 123456789’ 

[3]  IDTC6 iLOCN+U, iO 

[4]  +(LOCNl 1]=0 )/IDTCl 

[5]  CUAR+LOCN Cl] 

[6]  NUMB+LOCN [2] 

[7]  (Cffj42?-1  )  +  \tfZ/MB] 

[8]  LJZVtfC  (LINE  =  CR)  /ip  LINE]*'  C' 

[9]  LINE 

CIO]  CORVEC*-NUMBp\n,NUMBp  ’  ’ 

[11]  LINE+(TEMP+CORVEC*' / ’ ) /LINE 

[12]  Ctfi?  72?  O  (  ~  1 )  +  NUMBER  i  TEMP /COR  VEC 

[13]  COflmM+iO 

[14]  WC0i?«-pC<9Z?7£’£ 

[15]  I«-0 

[16]  12)7(73  :+(Z7CtfZ? <1^1+1  J/IDTCl 

[17]  ->(C,aff7Z?(7[J]^0)/IZ?!Z,C,2 

[18]  CORVECA+-CORVECA  ,  1 

[19]  +IDTC3 

[20]  IDTC2  :  CORVE CA+-CORVECA  ,  ( <702? FFCC  I]  p  0  )  ,1 

[21]  +IDTC3 

[22]  IZ)701  :LIZVE+00Z?7EC,4\£Jff£ 

[2  3]  LIZ!7£ 

[24]  NEW+\0 

[25]  NCOR<-p  LINE 

[26]  CORR<-(  (  l)+Ca/?7£C[4iO)p  ’  ' 

[27]  CORR+NCORpCORR  ,  \H,NCORp  ’  ’ 

[28]  NEW+\0 

[29]  j>0 

[30]  IDTC5  :  ->(  NCOR  <I<~I+1  )  /IDTCH 

[31]  NEW+NEW,lp((T*'  ' ) /T+LINElI] , CORR [I]),’  ’ 

[32]  -+IDTC5 

[33]  IDTCHiNEWt (NEW=' C' )/\pNEW]+CR 

[34]  TEMP+(NTEXTu)(NTEXT+-pTEXT)  -CHAR  +  NUMB-1  ) 

[35]  TEXT<-(  (NTEXTa.CHAR-1  )  /TEXT)  t  NEW ,  TEMP  /  TEXT 

[36]  +IDTC6 

[37]  IDTC1  :  TEXTA+-TEXT 

V 
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WIDEN  TEXT [DU  V 

V  IDENTEXT  LIST;PT;OVFL 

[1]  PT+O 

[2]  IDTE1  :+(  (pLIST)lll<PT<rPT+l)  /O 

[3]  NUCODE+I DENT COMP  WORD+-LI  STlPT ;] 

[4]  +(PRIMARYlNUCODE i2]*0) /IDTE2 

[5]  PRIMARYtNUC0DE-,2l+l  +  (pEXTERNAL)ll] 

[6]  -+IDTE3 

[  7]  IDTE 2  :-►(  a/F*FFFF,4L[  PPIAMFY[  NUCODE  ;  2  ]  ;  ]  =WORD  )  /IDTE 1 

[8]  -+(PRIMARYlNUCODE;  1]*0)  /IDTE 4 

[9]  PRIMARYlNUCODEill<ri  +  (pOVERFLOW)in 

[10]  -*  J  DTE  5 

[11]  JZ?7’E4  :  0 7FL«-PJ?.Z7fi4i?Y[  NUCODE  ;  1  ] 

[12]  IDTE1 :  -*(  A/FXTFPML[£yFFFLCW[C>yFL;  2]  ;  ]  =  )  /IDTE1 

[13]  -►(  0 7FFFL0f/[  0 FFL  ; 1 ] *0  )  /IDTE3 

[14]  OVERFLOWlOVFL  ;  1  ]«-l+  (  pOVERFLOW )  [  1  ] 

[15]  -+IDTE3 

[16]  I  DTE  6  :OVFL+-OVERFLOWlOVFL  ;  1] 

[17]  -+IDTE7 

[18]  IDTEb  :  TEMP<-(  (  1  +  (  pOVERF  LOW)  [  1  ]  )  ,2) 

[19]  OVERF  LOW+-TEMPp  (  .OVERFLOW  )  ,  0  ,  1+  (  pEXTERNAL  )  [  1  ] 

[20]  IDTE3 : TEMP+( ( 1+ ( pEXTERNAL ) [ 1 ] ) , 6 ) 

[21]  EXTERN AL+TEMPp ( .EXTERNAL) .WORD 

[22]  -kTFITFI 

V 


VIPFFFOFFPtmmv 
V  IDENTOUTPUT  TEXT 

[1]  RES  TART+-0 

[2]  LINE+-0 

[3]  0fljU?S«-l 

[4]  IDT01:+(0=LNG+pTEMP+IDENTLINE  TEXT)/ 0 

[  5  ]  (  (  LINE+LINE+ 1 )  ,  0/7  AP  5  ;  ’  '  ,  TEMP ) 

[6]  CHARS+-CHARS+LNG 

[7]  +IDT01 


V 


.  I 
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NIDENTFIND CD]V 

V  NUMCODE*IDENTFIND  WORD ;NUCODE ;OVFL 

[1]  NUMCODE* 1  0 

[2]  1+ 0 

[3]  IDTF  5  :-*-((  pWORD  )  [  1]  <I-*-I+l  )  /  0 

[4]  NUCODE*IDENTCOMP  TEMP*WORDlI ; ] 

[ 5 ]  *{PRIMARYl NUCODE ;  2  ]  =  0  ) /IDTF 1 

[6]  +(  A/IFMP  =  FyiFFML[PFm4Fy[F£/C,ODF;2]  ;  ]  )  /IDTF 2 

[7]  ■>(  Pi?m4Py[ffDCOPF;  1]  =0  )  /IDIF1 

[8]  OVFL+PRIMARYINUCODE i 1] 

[9]  IDTF  3  i  ■+(  a/TEMP=EXTERNALIOVERFLOWIOVFLi21  ;  ]  )/IDIF4 

[10]  +  (0FFPFL0J/[D7FL;  1]  =  0)  /IDIFl 

[11]  OVFL*OVERF LOWlOVFL; 1] 

[12]  -+IDTF  3 

[13]  IDIF1 : IDEE  TEXT { 1  6 ) p IFMP 

[14]  NUMCODE+NUMCODE ,  (  p  PXIPP/1M  L  )  [  1  ] 

[15]  +IDTF 5 

[16]  IDIF2  :FDMDDDF+FDMFDDF,PPm4Fy[Fi/C,DDF;2] 

[17]  -+IDTF5 

[18]  ID IP 4 :NUMCODE*NUMCODE,OVERFLOWlOVFL ; 2] 

[19]  -+IDTF5 

V 


VIDFWTPPDGF4M[  D  ]  V 

V  NUMB ER*IDENTPRO GRAM  PROG 

[1]  NUMBER* lO 

[2]  IDENTSYMBOLS 

[3]  CODE*OPER  ATOR  PROG 

[4]  I  PR* 0 

[5]  IDTP1  :  -*(  (  pFDDF)[  1]  <IP/?«-IPfl+l )  /O 

[6]  NU CORESIDENT  COMP  TEMP*  t CODElIPR ; ] 

[7]  ->(  PRIMARYlNU  CODE  ;2]=0)/IDIP6 

[8]  ->(  A/IFMP  =  FITFFtfi4L[PPm4i?y[FP(70DF;2]  ;  ]  )/IDIP 7 

[  9  ]  ->(  PRIMARYINUCODE  ;  1  ]  =0  )  /ID IP 6 

[10]  DyPI^PPIM^Py[PPPDDP; 1] 

[11]  IDTP2  :  -»■(  A/IFMP=FIIFPffAL[0yFFFL0J/[0FFL  ;  2]  ;  ]  )  /IDTP 8 

[12]  +( 0 VERFLOWl 0 VFL ; 1 ] =  0 ) /ID IP6 

[13]  OVFL*OVERF LOWlOVFL ; 1 ] 

[14]  -+IDTP2 

[15]  IDTP6 : NUMBER*NUMBER t INPCODElINPSYM \ TEMPI  1]] 

[16]  -+IDTP1 

[17]  IDIP7  :FP^PFP^FDMPFP,PPIMi4Py[FPC,DDF;2] 

[18]  +IDTP1 

[19]  IDTP8 : NUMBER*NUMBER tOVERF LOWlOVFL ; 2] 

[2  0]  -+IDTP1 


V 


. 
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VIDENTSYMBOLSl □] V 

V  IDENTSYMBOLS 

[  1  ]  INPSYM+- '  AB  CDEFGHI JKLMNOPQRSTU  VWXYZ  0123456789+-X*' 

[2]  SYMBOLS*-'  t.  ,  :  ;(  )<  =  >^ti' 

[3]  INPSYM+INPSYM, SYMBOLS , '?' 

[4]  INPCODE*-(  26pIDENTIFY  '  LETTER  '  )  ,  lOpIDENTIFY  'DIGIT' 

[5]  INPCODE+INPCODE , (2pIDENTIFY  ' ADDOP ' ) , 2p IDENTIF Y  'MUL 
OP' 

[6]  INPCODE+INPCODE , (IDENTFIND  OPERATORM  SYMBOLS ), 

“99 

V 


VIDPWTTE’XTCD]  V 
V  SYR  TEXT*-I  DENT  TEXT 

[1]  INPLIST+-  iO 

[2]  SYRTEXT+- 1  0 

[3] 

i 

[4]  IZOTl  : ItfPLISMU 

[5]  INPLIST+INPLIST , iO 

[  6  ]  -*(  J/VPLJiSTC  1  ]  =  ’  A  ’  )  /  0 

[7]  SYRTEXT+SYRTEXT ,IRPLIST,CR 

[8]  -+IDTT1 


V 
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VIDENTSYNTAXlUlV 

V  RULE+IDENTSYNTAX  SYNTAX 

[1]  RULE+-  iO 

[2]  'MAX  RULE  SIZE ' 

[3]  MAX*-  r+1 

[4]  RESTARTS 

[5]  IDTS1 1  T+-IDENTLINE  SYNTAX 

[6]  T+OPERATOR  T 

[7]  -*(  '  Z  '  =!Z7[  1 ;  J  ) /O 

C  8 ]  S+\0 

[9]  IDTS+- 0 

[10]  IDTS 2:+(  (p!7,)[l]<JD!r5^JZ)2,5+l)/IZ?275  3 

[11]  S+S  ,IDENTFIND(  1  6  )  p Z»[ IDTS  ;  ]  ,  6 p  ’  * 

[12]  +IDTS 2 

[13]  IDTS3  :RULE+(  ( 1  +  (  pPFLF  )  [  1  ]  )  ,MAX+1  )p(  ,RULE)  t  ( 1  +  p 5)  .AM* 
pS.AMZpO 

[14]  -►IPT’S'l 

V 


V0P£Z:P£2?AML[[]]V 

V  OPEXTERNAL ; J 

[1]  LIMPED,  1  0 

[2]  I*-LIMITl 1 ] 

[3]  PP£P«-(LIMIF[  2]  =1 )/(  pFZPFFAMLH  1] 

[4]  +(  (pSTOP)*0) /0PEX1 

[  5  ]  STOP*- L /LIMITl  2]  ,  (  p EXTERNAL  )  [  1  ] 

[6]  I«-I-l 

[7]  0PEX1  °.+(STOP<I+-I+l  )/0 

[8]  ( J  ;  ’  ’  ,  (ETPFF/MZ^  J;  ]*'  ’  )/£TrFFAML[I;]  ) 

[9]  +OPEX 1 

V 


vopppjM>ipy[n]v 

V  OPPRIMARY 

[1]  (PPm42?y[  ;  2]  *0)/[l](  $(  3,  Dp  (  \T*-(pPRIMARY)Lll  ), 

PRIMARY) 

V 


V(9P(7  7PPPLCW[[]]  V 
V  OPOVERFLOW 

[1]  $(  3,  Dp  (  p01/PPFL0r/)[  1]  )  ,  ^OVERFLOW 


V 


. 


. 

_ 


VOPERATORIUIV 


V  CODE+OPERA TOR  OPERS ;PT;S 

[1]  CODE+\ 0 

[  2  ]  S+ \ 0 

[3]  PT+-  0 

[4]  0PP1 s-KOPffflSTPT+PT+l] = ’  ')/OPR 2 

[5]  S,«-S#0P£7?S[P?'] 

[6]  -+OPP1 

[7]  £PP2 iCODE+CODE, 6p5, 6p »  * 

[8]  S«-iO 

[9]  +{PT<pOPERS) /OPR1 

[10]  CODE+(  (  (pCODE)t6  )  ,6)pC£Z)P 

V 


voppp/iirtfPMcniv 

V  CODE+OPERATORM  OPERS 

[  1]  OPERS*-  ,§(  2  t  (  p  OPERS)  )p  OPERS  A  pOPERS)p  ’ 

[2]  CODE*- x  0 
[  3  ]  5*- 1  0 

[4]  P!T«-0 

[5]  OPRM1  :-+(OPERSlPT+PT+ll  =  '  ')/OPRM2 

[6]  S+S  tOPERSlPT'} 

[7]  -+OPRM1 

[8]  OPRM2  i  CODE*-CODE  , 6pS , 6p  *  » 

[  9  ]  S*-  x  0 

[10]  +(PT<pOPPP5)  /OPRMl 

[11]  CODE*-(  (  (p  CODE)  *6)  t6)pCODE 


V 


RLOADl D]V 


V  LOAD  T 

[1]  -»-(  T=  *  f  ) /LOADltLOAD2iLOAD3 

[2]  L0AD1 t GSTACK+GSTACK , GOAL 

C  3  ]  -*0 

[4]  LOAD  2 iS ST ACROSS  TACK, SOURCE 

[5]  +0 

[  6  ]  LOA D3  :  CST A  CK+CSTA  CK  ,  CHAR 

V 


UUNLOADlUM 

V  UNLOAD  T 

[1]  +  (T  = 'GSC' ) /UNLD1,UNLD2,UNLD3 

[2]  UNLDliGOAL+VALUE  GSTACK 

[3]  GST  A  CK+UNS  TA  CK  GSTACK 

[4]  -*0 

[5]  UNLD2  :  SOURCE+-VALUE  SSTACK 

[6]  SSTACK+UNSTACK  SSTACK 

C  7  ]  -*0 

[8]  UNLD3xCHAR+VALUE  CSTACK 

[9]  CST  A  CK+-UN  STACK  CSTACK 

V 


NUNSTACK  C[]]V 

V  STACKO+UNSTACK  STACK 

Cl]  STACKO+(~(  pS  TACK)  col)  /STACK 

V 


VVALUE  [C]]V 
V  TOP+VALUE  STACK 
[1]  TOP+STACKlpSTACKl 


V 
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IDENTEXT 
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NUMERIC  REPRESENTATION  STRUCTURES 

The  three  matrices  used  to  represent  the  alphabetic 

descriptions  of  terms  by  numbers  are: 

EXTERNAL  -  A  six  column  matrix  containing  the  alphabetic 
representation  of  the  words.  The  identification 
number  assigned  to  a  word  is  the  row  index  of  the 
word  in  EXTERNAL . 

PRIMARY  -  A  200  x  2  element  matrix.  The  row  index  is  the 
number  assigned  to  a  word  by  IDENTCOMP .  If  Column  1 
is  nonzero  it  is  the  row  index  of  Overflow  where  a 
word  with  the  same  code  number  is  stored.  Column  2 
contains  the  row  index  of  EXTERNAL  where  a  word  with 
this  code  number  is  stored. 

OVERFLOW  -  A  2  column  matrix  which  contains  references  to 
words  which  have  the  same  number  assigned  to  them  by 
IDENTCOMP .  If  Column  1  is  nonzero  it  is  a  row  of 
OVERFLOW  which  references  another  word  with  the  same 
number  assigned  by  IDENTCOMP ..  Column  2  points  to  a 
row  of  EXTERNAL  which  has  a  word  with  the  given  code 


number . 


- 


APPENDIX  B 


PROGRAMMING  LANGUAGE  GRAMMARS 

The  Backus  Normal  Form  and  Irons’  Notation  for  a 
simple  test  grammar  and  a  grammar  of  a  subset  of  ALGOL 
are  given.  The  test  grammar  is  equivalent  to  that  given 
in  Chapter  4.  The  two  changes  in  notation  are  the  symbols 
and  are  not  used  and  the  braces  {  }  are  replaced  by 
: [  ] :  .  Special  routines  are  used  to  process  the  terminal 


characters . 


ALGOL  SYNTAX  (Backus  Normal  Form) 


IDENTF 

NUMBER 

DECNUM 

DECFRC 

INTEGR 

UNSINT 

VARIAB 

ARTEXP 

SIMAEX 

IFCLAS 

TERM  : : 

FACTOR 

PR  I  MR  Y 

ADDOP  : 

MULOP  : 

BOOEXP 

SI MB  00 

RELATE 

RE LAO P 

LOGVAL 

PROGRM 

BLOCK  : 

UNLBLK 

BLKHD  : 

SIMPTL 

COMPTL 

COMP ST 

URL  CMS 

ST AT EM 

UNCSTA 

BASSTA 

UNLBST 

ASSSTA 

LFTPTL 

GOTOST 

DESEXP 

LABEL  : 

10 ST AT 

REDSTA 

WRTSTA 

INLIST 

CONSTA 

IF ST AT 

ITRSTA 

FORSTA 

FORCLA 

FORLST 

FOLSEL 

DECLAR 

TYPE  : : 

TYPLST 


: =  LETTER  \  IDENTF  LETTER 
: =  DECNUM  \  +  DECNUM  |  -  DECNUM  \  INTEGR 
:=  UNSINT  .  |  DECFRC  \  UNSINT  DECFRC 
: =  o  UNSINT 

: =  UNSINT  |  +  UNSINT  |  -  UNSINT 

: =  DIGIT  |  UNSINT  DIGIT 
: =  IDENTF 

: =  SIMAEX  |  IFCLAS  ARTEXP  ELSE  ARTEXP 
:=  TERM  |  SIMAEX  ADDOP  TERM 
: =  IF  BOOEXP  THEN 
FACTOR  |  TERM  MULOP  FACTOR 
:=  PR I  MR Y  |  FACTOR  +  PR I MR Y 
:=  NUMBER  \  VARIAB  |  (  ARTEXP  ) 

=  +  I  - 
=  x  I  * 

: =  SIMBOO  |  IFCLAS  BOOEXP  ELSE  BOOEXP 
:=  LOGVAL  \  RE LA TN  \  (  BOOEXP  ) 

: =  SIMAEX  RELAOP  SIMAEX 

:  =  <  I  =  I  >  I  * 

:  =  T  |  1 

: =  BLOCK  .  |  COMPS T  . 

=  UNLBLK  |  LABEL  :  BLOCK 
: =  BLKHD  ;  COMPTL 
=  BEGIN  DECLAR 
:=  STATEM  \  SIMPTL  ;  STATEM 
: =  SIMPTL  END 

: =  UNLCMS  \  LABEL  :  COMPS T 

: =  BEGIN  COMPTL 

:  =  UNCSTA  \  CONSTA  \  ITRSTA 

:  =  COMP  ST  |  |  3ASS7A 

: =  UNLBST  |  LABEL  :  SASSTA 

:=  ASSSTA  |  GOTOST  \  IOSTAT 

: =  LFTPTL  ARTEXP  \  LFTPTL  BOOEXP 

:  =  7ASIAH  <- 

: =  SCTS  DESEXP 

:  =  LABEL 

=  IDENTF 

:  =  SSSSTA  |  f/STSTM 
:  =  SSAS  (  INLIST  ) 

:  =  f/SlTS  (  INLIST  ) 

: =  FASJAS  |  INLIST  VARIAB 
:=  IFSTAT  ELSE  STATEM  |  LABEL  :  CONSTA 
:=  IFCLAS  STATEM 
: =  FSFSTA 

:=  FSSSSA  STATSM  |  LABEL  :  FORSTA 
:  -  FOR  VARIAB  «-  FORLST  DO 
: =  FOLSEL  \  FORLST  ,  FOLSEL 
:=  ARTEXP  STEP  ARTEXP  UNTIL  ARTEXP 
: =  TYPS  TYPLST 
REAL  |  INTGER  |  BOOLEN 
:  =  FA  PI  AS  |  TYPLST  ,  FA  PI  AS 


k- 


ALGOL  SYNTAX  (Iron's  Notation) 


IDENTF 

HUMBER 

DECHUM 

DECFRC 

IHTEGR 

UHSIHT 

VARIAB 

ARTEXP 

SIMAEX 

IFCLAS 

TERM  : : = 

FACTOR 

PR  I  MR  Y 

ADDOP  : 

MULOP  : 

BOOEXP 

SIMBOO 

RE  LATH 

RELAOP 

LOGVAL 

PROGRM 

BLOCK  : 

UHLBLK 

BLKHD  : 

SIMPTL 

COMPTL 

COMP ST 

UHLCMS 

ST AT EM 

UHCSTA 

BASSTA 

UHLBST 

ASSSTA 

LFTPTL 

GOTOST 

DESEXP 

LABEL  : 

I  OS  TAT 
REDSTA 
WRTSTA 
IHLIST 
COHSTA 
IF ST AT 
I  TEST  A 
FOR  ST A 
FOR C LA 
FORLST 
FOLSEL 
DECLAR 
TYPE  : : = 
TYPLST  : 


: =  LETTER  : [  LETTER  ] : 

: =  DECHUM  |  +  DECHUM  |  -  DECHUM  \  IHTEGR 
: =  UHSIHT  .  |  DECFRC  |  UHSIHT  DECFRC 
i =  .  UHSIHT 

: =  UHSIHT  |  +  UHSIHT  \  -  UHSIHT 

: =  DIGIT  : [  DIGIT  ] : 

: =  IDEHTF 

: =  SIMAEX  |  IFCLAS  ARTEXP  ELSE  ARTEXP 
: =  TERM  : [  ADDOP  TERM  ] : 

:  =  IF  BOOEXP  THEH 
=  FACTOR  :[  MULOP  FACTOR  ] : 

:  =  PR  I  MR  Y  :  [  t  PR  I  MR  Y  ]  : 

: =  HUMBER  \  VARIAB  |  (  ARTEXP  ) 

=  +  I  - 
=  *  I  * 

: =  SIMBOO  |  IFCLAS  BOOEXP  ELSE  BOOEXP 
i =  LOGVAL  |  RELATH  |  (  BOOEXP  ) 

:=  SIMAEX  RELAOP  SIMAEX 

:  =  <  I  =  I  >  I  * 

:  =  T  |  1 

i-  BLOCK  .  |  COMPS T  . 

=  UHLBLK  |  LABEL  :  BLOCK 

: =  BLKHD  ;  COMPTL 

=  BEGIN  DECLAR  :[  ;  DECLAR  ]; 

:=  STATEM  :[  ;  ST  A  TEM  ]: 

: =  SIMPTL  END 

: =  UHLCMS  \  LABEL  :  COMPS T 

:=  BEGIN  COMPTL 

:  =  UHCSTA  \  COHSTA  \  ITRSTA 

: =  COMPS T  |  BLOCK  \  BASSTA 

: =  UHLBST  |  LABEL  :  BASSTA 

:=  ASSSTA  \  GOTOST  \  IOSTAT 

: =  LFTPTL  ARTEXP  \  LFTPTL  BOOEXP 

i  =  VARIAB  +- 

: =  GOTO  DESEXP 

:=  LABEL 

=  IDEHTF 

:=  REDSTA  |  WRTSTA 
:=  READ  (  IHLIST  ) 
i =  WRITE  (  IHLIST  ) 

: =  VARIAB  : [  VARIAB  ] : 

:=  IFSTAT  \  IF ST AT  ELSE  STATEM  \  LABEL  :  COHSTA 
:=  IFCLAS  STATEM 
:=  FORSTA 

:=  FORCLA  STATEM  \  LABEL  :  FORSTA 
:  =  F0/?  VARIAB  +  FORLST  DO 
:=  FOLSEL  :[  ,  FOLSEL  ] : 

:=  ARTEXP  \  ARTEXP  STEP  ARTEXP  UNTIL  ARTEXP 
: =  TYPE  TYPLST 
=  REAL  |  IHTGER  |  BOOLEH 
:=  VARIAB  : [  ,  VARIAB  ]: 


: 


ALGOL  SYNTAX  ( 


1 

O  8  ~ 

2 

1 

3 

i[ 

4 

]  : 

5 

IDENTF 

6 

LETTER 

7 

NUMBER 

8 

DECNUM 

9 

+ 

10 

- 

11 

INTEGR 

12 

UNSINT 

13 

0 

14 

DECFRC 

15 

DIGIT 

16 

VARIAB 

17 

ARTEXP 

18 

SIMAEX 

19 

IFCLAS 

20 

ELSE 

21 

TERM 

22 

ADDOP 

23 

IF 

24 

BOOEXP 

25 

TEEN 

26 

FACTOR 

27 

MULOP 

28 

PR I MR Y 

29 

+ 

30 

( 

31 

) 

32 

X 

33 

• 

• 

34 

SIMBOO 

35 

LOGVAL 

36 

RE LA TN 

37 

RE LAO P 

38 

< 

39 

= 

40 

> 

41 

* 

42 

T 

43 

1 

44 

PROGRM 

45 

BLOCK 

COMP ST 
UNLBLK 

LABEL 

« 

BLEED 

c 

9 

COMPTL 

BEGIN 

DECLAR 

S I MPT L 

STATEM 

END 

UNLCMS 
UN C ST A 
CONST A 
ITRSTA 
BASSTA 
UNLBST 
ASSSTA 
GOTOST 
IOSTAT 
LFTPTL 
<- 

GOTO 

DESEXP 

REDSTA 

WRTSTA 

READ 

INLIST 

WRITE 

IF ST AT 

FORSTA 

FOR C LA 

FOR 

FORLST 

DO 

FOLSEL 

STEP 

UNTIL 

TYPE 

TYPLST 

REAL 

INTGER 

BOOLEN 


Numeri 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 
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TEST  SYNTAX 
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BACKUS  NORMAL  FORM  -  PROSE 


VARI  ::=  LETTER  \  VARI  LETTER 
INTEGR  : : =  DIGIT  \  INTEGR  DIGIT 
FACTOR  ::=  VARI  |  INTEGR  |  (  AREXP  ) 

TERM  ::=  FACTOR  \  TERM  MU LOP  FACTOR 
AREXP  : : =  TERM  |  APFJP  APPOP  TERM 
ASS I GN  : : =  YAPI  =  AFFXP 
PFOS  :  :  =  ASSIGW  |  PROG  ;  ASSIGN 


NUMERIC 

7 

5 

1 

6 

2 

5 

6 

0 

0 

0 

7 

7 

1 

8 

2 

7 

8 

0 

0 

0 

10 

9 

1 

5 

2 

7 

2 

10 

11 

12 

8 

13 

1 

9 

2 

13 

14 

9 

0 

0 

8 

11 

1 

13 

2 

11 

15 

13 

0 

0 

6 

16 

1 

5 

17 

11 

0 

0 

0 

0 

8 

18 

1 

16 

2 

18 

19 

16 

0 

0 

IRONS 

f  NOTATION  - 

PROSE 

VARI  i : =  LETTER  : [  LETTER  ] : 

INTEGR  : : =  DIGIT  : [  DIGIT  ] : 

FACTOR  ::=  YAFI  |  INTEGR  |  (  APPXP  ) 

TPPM  ::=  FACTOR  i C  MULOP  FACTOR  ] : 
APPJP  : : =  TPPM  : [  APPOP  TERM  ] : 
A55JGP  : : =  VARI  =  AREXP 
PROG  ::=  APPIPP  :[  ;  ASSIGN  ] : 


NUMERIC 

7 

5 

1 

6 

3 

6 

4 

0 

0 

0 

7 

7 

1 

8 

3 

8 

4 

0 

0 

0 

10 

9 

1 

5 

2 

7 

2 

10 

11 

12 

8 

13 

1 

9 

3 

14 

9 

4 

0 

0 

8 

11 

1 

13 

3 

15 

13 

4 

0 

0 

6 

16 

1 

5 

17 

11 

0 

0 

0 

0 

8 

18 

1 

16 

3 

19 

16 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


. 

. 
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EXTERNAL 


PRIMARY 


OVERFLOW 


:C 
]  s 


VARI 

LETTER 

INTEGR 

DIGIT 

FACTOR 

( 

AREXP 

) 

TERM 

MU  LOP 

ADDOP 

ASSIGN 

PROG 

9 

9 

4 

0 

13 

19 

0 

14 

24 

1 

8 

37 

0 

15 

42 

0 

17 

45 

0 

7 

58 

0 

1 

73 

0 

10 

75 

0 

12 

82 

0 

2 

83 

0 

19 

86 

0 

11 

107 

0 

9 

144 

0 

4 

153 

0 

6 

162 

0 

5 

191 

0 

16 

198 

0 

3 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 


1 


0  18 


. 

APPENDIX  C 


TEST  EXAMPLES 

THE  CONVENTIONAL  ALGORITHM 

The  parse  of  a  statement  is  reconstructed  from  the 
level  numbers  and  symbols  printed  by  the  conventional 
algorithm.  The  parse  is  displayed  with  the  aid  of  an  array 
which  has  a  column  for  each  terminal  character  and  a  row 
for  each  level  number. 

The  level  numbers  indicate  which  row  a  symbol  will 
appear  in.  Level  numbers  are  opened  by  terminal  characters 
and  closed  by  non-terminal  symbols.  An  open  level  number 
appears  on  the  left  of  a  column  and  a  closed  level  number 
on  the  right  of  a  column.  A  terminal  symbol  will  appear  in 
its  column  and  metaresults  will  occupy  one  or  more  columns 
of  a  row  indicated  by  the  corresponding  level  number.  Any 
symbol  associated  with  a  closed  level  number  t  is  defined 
by  the  symbol(s)  associated  with  level  number  t  +  1. 

The  conventional  algorithm  may  fail  to  choose  the 
correct  analysis  of  a  statement  immediately.  Thus,  it  may 
be  necessary  to  modify  the  output  by  removing  all  listings 
from  the  first  up  to,  but  excluding,  the  last  listing  of  a 
terminal  character.  The  modified  output  is  converted  by  the 
following  procedures. 


. 
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Assume  t  is  the  last  level  number  which  has  been 
either  opened  or  closed  -  t  =  1  initially.  When  a  terminal 
character  with  level  number  n  is  encountered  the  level 
numbers  t  <  x  <  n  are  opened  on  the  left  of  the  column 
for  that  terminal  character  and  the  terminal  symbol  is 
placed  in  its  column  and  row  location.  If  a  metaresult 
with  level  number  t  is  encountered,  level  number  t  +  1 
is  closed  if  it  refers  to  a  terminal  character,  the  meta¬ 
result  is  written  in  the  row  t  and  t  is  closed  in  the 


right  of  the  column  of  the  last  terminal  character  processed. 
When  level  number  1  is  closed  the  structure  is  complete. 
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CONVENTIONAL  ARRAYS  -  TEST  SYNTAX 


COLUMNS 


MATRIX 


1 

VAR  I 

2 

INTEGR 

0 

1 

2  3  4  5 

6  7 

3 

FACTOR 

1 

1 

0  111 

1  1 

4 

TERM 

2 

0 

1111 

1  1 

5 

AREXP 

3 

0 

0  111 

1  1 

6 

ASSIGN 

4 

0 

0  0  0  0 

0  1 

7 

PROG 

5 

0 

0  0  0  0 

1  1 

ROWS 

1 

LETTER 

2 

DIGIT 

3 

( 

3 

) 

3 

MULOP 

3 

ADDOP 

4 

• 

9 

5 

SYNTAX 

TYPE 

TABLE 

SYNTAX  STRUCTURE 

TABLE 

NUM 

TYPE  INDEX 

TERM  LKFR 

INDEX 

TYCD  STRC  SUCC  ALTR 

1 

VARI 

5 

0  1 

1 

6  12  0 

2 

INTEGR 

7 

0  3 

2 

6  12 

1 

3 

FACTOR 

9 

0  5 

3 

8  14  0 

4 

TERM 

13 

0  10 

4 

8  14 

1 

5 

AREXP 

11 

0  13 

5 

5  10  6 

6 

ASSIGN 

16 

0  16 

6 

7  10  7 

7 

PROG 

18 

0  19 

7 

10  0  8 

0 

8 

LETTER 

6 

1  6 

8 

110  9 

0 

9 

DIGIT 

8 

1  8 

9 

12  1  0 

0 

10 

( 

10 

1  10 

10 

9  111 

0 

11 

) 

12 

1  12 

11 

14  0  12 

1 

12 

MULOP 

14 

1  14 

12 

9  1  11 

0 

13 

ADDOP 

15 

1  15 

13 

13  1  14 

0 

14 

= 

17 

1  17 

14 

15  0  15 

"1 

15 

o 

9 

19 

1  19 

15 

13  1  14 

0 

16 

5  0  17 

0 

17 

17  0  18 

0 

18 

11  1  0 

0 

19 

16  1  20 

0 

20 

19  0  21 

1 

21 

16  1  20 

0 

. 


* 
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ANALYZE  'A=B+1\C=D' 

4  A  TERMINAL 
3  VARI 

3  =  TERMINAL 
7  B  TERMINAL 

6  VARI 

5  FACTOR 

4  TERM 

4  +  TERMINAL 

7  1  TERMINAL 

6  INTEGR 

5  FACTOR 
4  TERM 

3  4PPJP 

2  ASSIGN 

2  ;  TERMINAL 

4  C  TERMINAL 

3  74PI 

3  =  TERMINAL 

7  D  TERMINAL 

6  74  PT 

5  FACTOR 

4  TERM 

3  AREXP 
2  ASSIGN 
1  PROG 


1 

2 

3  74PI  3 

4  4  4 


PP£G  1 

ASSIGN  2  ;  2  ASSIGN  2 


3  AREXP 

3 

3  VARI  3  = 

3 AREXP  3 

4  TPPM  4  + 

4  TERM  4 

4  C  4 

4  TPPM  4 

5F4CT0P5 

SF ACTORS 

SF ACTORS 

6  VARI  6 

6INTEGR6 

6  74PJ  6 

IB  7 

7  1  7 

ID  7 

. 


5 

4 

11 

10 

9 

11 

10 

9 

14 

13 

12 

12 

11 

18 

17 

16 

15 

14 

13 

14 

13 

19 

18 

19 

18 

17 

16 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

5 

4 

3 

2 

2 

1 
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ANALYZE  'BEGIN  A  «-  A  +  1  END  . ' 

BEGIN  TERMINAL 
BEGIN  TERMINAL 
A  TERMINAL 
IDENTF 
LABEL 

A  TERMINAL 
IDENTF 
LABEL 

A  TERMINAL 

IDENTF 

VARIAB 

<-  TERMINAL 

LFTPTL 

A  TERMINAL 

IDENTF 

VARIAB 

PR  I  MR  Y 

FACTOR 

TERM 

+  TERMINAL 
ADDOP 
1  TERMINAL 
UNSINT 
1  TERMINAL 
UNSINT 
INTEGR 
NUMBER 
PR  I  MR  Y 
FACTOR 
TERM 
SIMAEX 
ARTEXP 
ASSSTA 
UNLBST 
BASSTA 
UNCSTA 
ST A TEM 
SIMPTL 

END  TERMINAL 

COMPTL 

UNLCMS 

COMPST 

.  TERMINAL 

PROGRM 
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THE  TRANSITION  DIAGRAM  ALGORITHM 

The  parse  of  a  statement  produced  by  the  transition 
diagram  algorithm  is  displayed  using  almost  the  exact  method 
employed  for  the  conventional  algorithm.  The  only  difference 
results  from  the  NO-BACKUP  condition  which  insures  that  the 
algorithm  will  produce  no  incorrect  parses  and  hence  the 
output  need  not  be  modified. 
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THE  MULTIPLE  PARSE  ALGORITHM 

For  each  terminal  symbol  this  algorithm  produces 
vectors  of  numbers  representing  the  chains  of  metaresults 
which  the  terminal  symbol  can  define  on  the  basis  of  the 
preceding  elements.  Two  pointers  are  provided  for  each 
vector.  The  second  pointer  numbers  the  vectors  for  this 
symbol.  The  first  pointer  indicates  which  vector  for  the 
preceding  symbol  was  used  to  generate  the  given  vector. 

If  the  grammar  is  unambiguous,  there  will  only  be  one  vector 
for  the  last  terminal  symbol. 

The  first  step  in  reconstructing  the  parse  is  to 
select  the  vectors  determined  by  the  last  terminal  symbol 
and  the  pointers  provided.  As  for  the  conventional  algorithm, 
the  parse  can  be  represented  by  an  array.  The  array  is  com¬ 
pleted  by  listing  the  appropriate  symbols  where  the  vector 
number  indicates  the  column  and  the  index  of  the  element 
indicates  the  row.  If  a  symbol  appears  in  more  than  one 
column  of  a  row,  it  should  be  listed  only  once  with  indicators 
used  to  mark  the  ends  of  the  appropriate  columns.  A  meta¬ 
result  in  row  t  is  defined  by  the  elements  in  row  t  +  1 
which  are  between  the  row  t  indicators. 
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MULTIPLE  PARSE  ARRAYS  -  TEST  SYNTAX 
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